builder
builder copied to clipboard
Proposal: send logs directly from builder pods back to builder pod
Note: I believe others have suggested a similar or identical solution to this problem in the past. Hopefully this issue solidifies those ideas.
Rel https://github.com/deis/builder/pull/185 Rel https://github.com/deis/builder/issues/199 Rel #298
Problem Statement
As of this writing, the builder does the following to do a build:
- Launch a builder pod (slugbuilder or dockerbuilder)
- Poll the k8s API for the pod's existence
- Begin streaming pod logs after the pod exists
We've found issues with this approach, all of which stem from the fact that the pod may not be reported as running during any polling event. This is a race condition, from which so far we've found the following symptoms:
- The pod has started & completed inside of one polling interval
- Attempted solution in https://github.com/deis/builder/pull/185. Note that this will not address the problem laid out in (2)
- The pod has started, completed and been garbage collected inside of one polling interval
- Temporary fix that relies on internal k8s GC implementation at: https://github.com/deis/builder/pull/206
Solution Details
Because of this race condition, we can't rely on polling, and even if we successfully use the event stream (#185), k8s GC doesn't guarantee that pod logs will still be available after the pod is done. This proposal calls for the builder pod to stream its logs back to the builder that launched it.
Here are the following changes (as of this writing) that would need to happen to make this work:
- Each git-receive hook process runs a websocket server (on a unique port, assigned by the builder SSH server) that accepts incoming logs from the builder pod. It uses these logs for the following purposes:
- Writes them to STDOUT (for the builder to write back to the SSH connection)
- Look for a
FINISHEDmessage that indicates the builder pod is done
- Each git-receive process launches builder pods with its "phone-home" IP and port, which is the websocket server that they should write their logs to
- The builder pods now include a program that launch the builder logic (a shell script for slugbuilder and a python program for dockerbuilder). This program's purpose is to:
- Stream STDOUT & STDERR via a websocket connection to the phone-home address
- Send a
FINISHEDmessage when the builder logic exits
After the builder's git-receive hook receives the FINISHED message, or after a generous timeout, it can shut down the websocket server and continue with the logic it already has. The builder no longer would need to rely on polling the k8s API if this proposal were implemented.'
We are anyways thinking about implementing JOBs . Which might change a lot of behavior. Also A POD getting garbage collected immediately without changing the event type is not an expected K8s behavior. The intended behavior is Event - pod status Added - A pod is created Modified -- status changes from pending to running Deleted -- status Succeeded or something with error code 0 or greater.
because of some labels mess we are not observing the POD status change from pending to running rather GC starts collecting the POD the event will be Deleted directly even though the POD is running . which is not an intended behavior . No point in streaming the logs back if the the POD is garbage collected in the middle of an execution.
https://github.com/deis/builder/pull/185 this will solve a lot of things. I feel there is no need of special web socket connection to stream logs back.
@smothiki I'm not sure how #185 would solve this particular problem if we don't launch jobs. However, I am :+1: on using jobs for our builds when they come out of extensions. If I understand http://kubernetes.io/v1.1/docs/user-guide/jobs.html correctly, we'll be able to make an API call to get the logs of the job even if it's complete at the time of calling.
promoting to beta3
Punting to beta4
This issue was moved to teamhephy/builder#31