docker-py
docker-py copied to clipboard
Is there a reason why `client.images.build` doesn't stream output logs ?
Seems to be a recurrent question. I looked at the code and it seems to me there is no reason why the higher-level API generator couldn't return the output logs as they come. This could be achieved by using a sort of fake future for the image object returned by the build function. Something like that (very rough sketch) :
def build(self, **kwargs):
resp = self.client.api.build(**kwargs)
if isinstance(resp, six.string_types):
return self.get(resp)
last_event = None
image_id = None
result_stream, internal_stream = itertools.tee(json_stream(resp))
result = {}
def gen():
for chunk in internal_stream:
if 'error' in chunk:
raise BuildError(chunk['error'], result_stream)
if 'stream' in chunk:
match = re.search(
r'(^Successfully built |sha256:)([0-9a-f]+)$',
chunk['stream']
)
if match:
image_id = match.group(2)
else:
yield chunk
last_event = chunk
if image_id:
result['image'] = self.get(image_id)
else:
raise BuildError(last_event or 'Unknown', result_stream)
return result, gen()
Maybe this could be switched-on by adding an option to build for example stream_output ?
And this can be used like this :
image, output_iter = docker_client.images.build(fileobj=fd, tag=args.tag)
for line in output_iter:
print(line)
print(image)
Of course the nicer solution for that would be to use asyncio but that's then becoming more complicated :P
The original design was that because it was the high-level client, the result (image object or exception in case of failure) was more important to the user than the logs - and if that wasn't the case, one could still use the low-level API.
As you remark, it's been a recurring theme on this tracker for a while now, and we've tried to compromise somewhat already by returning the logs after the operation finishes. Maybe we'll look at doing something better in 4.0.
I understand the idea between this division, it's just that I feels silly to duplicate the high-level API source code in order to parse my output from the low-level API (I want to get the image id). Maybe extracting the parsing from the high-level API and making it available as some sort of utils would be useful then ( output event -> parsed event<Error|ImageId|Log> ) ?
Ah nevermind, I don't actually need to parse, I can just use client.images.get(name) no need for image id
thanks for the great library by the way :) loving it
Hit this one as well - need to show logs as they come. Ended up using low level API: from_env().api.build()
The original design was that because it was the high-level client, the result (image object or exception in case of failure) was more important to the user than the logs - and if that wasn't the case, one could still use the low-level API.
As you remark, it's been a recurring theme on this tracker for a while now, and we've tried to compromise somewhat already by returning the logs after the operation finishes. Maybe we'll look at doing something better in 4.0.
The problem I find with this approach is that if the build fails for some reason, I'd like to get the logs to determine the reason, however that is not possible:
try:
image, live_log_generator = client.images.build(path=".", tag="master")
except docker.errors.BuildError as e:
logger.error(e)
for line in live_log_generator: # oops, live_log_generator is unassigned
logger.info(line)
The other thing is that with using the low level build API, one is basically forced to reimplement the gory-details of docker/models/images.py#build(), i.e. matching the regex so that image ID can be determined (in order to construct the Image instance) and grep-ing for error patterns. If one wants to avoid that, it is necessary to resort to ugly workarounds like this:
try:
image, _ = client.images.build(path=".", tag="master")
except docker.errors.BuildError as e:
logger.error(e)
# Rebuild the image using low level API, hoping the failure is reproducible.
live_log_generator = client.api.build(path=".", tag="master")
for line in live_log_generator:
line_dict = json.loads(line)
if line_dict.get("stream"):
logger.error(line_dict["stream"].rstrip())
which doubles the time needed to diagnose the problem.
Thanks @haizaar and @shin- this tread old but gold :)