autoware icon indicating copy to clipboard operation
autoware copied to clipboard

refactor(docker): introduce `src-imported` stage

Open youtalk opened this issue 9 months ago • 9 comments

Description

This is the resubmission of #4709 to push the upstream src-imported branch.

This PR adds the src-imported stage to the top stage of the Dockerfile. This allows the base and src-imported stages to be processed in parallel, reducing the download latency of the vcs import command. It also eliminates the need to run the vcs import command again in the runtime stage. The trial in my local environment speeded up the docker build by about 5 minutes.

This PR does not cut out rosdep install, based on the reflection of the previous PR: https://github.com/autowarefoundation/autoware/pull/4656#issuecomment-2082095239

Tests performed

Not applicable.

Effects on system behavior

Not applicable.

Interface changes

Pre-review checklist for the PR author

The PR author must check the checkboxes below when creating the PR.

In-review checklist for the PR reviewers

The PR reviewers must check the checkboxes below before approval.

Post-review checklist for the PR author

The PR author must check the checkboxes below before merging.

  • [ ] There are no open discussions or they are tracked via tickets.

After all checkboxes are checked, anyone who has write access can merge the PR.

youtalk avatar May 13 '24 12:05 youtalk

We are waiting for:

  • https://github.com/autowarefoundation/autoware/actions/runs/9062897702 to succeed, then we can merge.

xmfcx avatar May 13 '24 12:05 xmfcx

no-cuda was succeeded but cuda was failed...

Unhandled exception. System.IO.IOException: No space left on device : '/home/runner/runners/2.316.1/_diag/Worker_20240513-205133-utc.log'

youtalk avatar May 13 '24 23:05 youtalk

@xmfcx Do you remember if put the fix for "No space left on device" for the cuda workflow?

mitsudome-r avatar May 14 '24 08:05 mitsudome-r

This is something new. In this job: https://github.com/autowarefoundation/autoware/actions/runs/9062897702/job/24897719154#step:3:1835

AFTER CLEAN-UP:

$ dh -h /

Filesystem      Size  Used Avail Use% Mounted on
/dev/root        73G   21G   53G  28% /

There is 53GB space, which should be plenty.

But the job also took 6h 4m 59s. which is not normal.

xmfcx avatar May 14 '24 11:05 xmfcx

https://github.com/autowarefoundation/autoware/actions/runs/8921431056 2024-05-14_14-31

The last time this ran, it took about 2.5 hours.

xmfcx avatar May 14 '24 11:05 xmfcx

First of all I'm going to speed up the CI build with cache and then come back here. We should solve the speed problem first.

youtalk avatar May 14 '24 14:05 youtalk

@youtalk If you COPY the source code from a different layer then you will not be able to remove it in the next RUN command which is an another layer. So that means the src files would still take space on the docker cache even you are deleting them with rm . So in here there is a trade-off between reducing the build time by 5 minutes vs adding whole source files into runtime image. I think we should be in favor of the reduced-size of the runtime image.

oguzkaganozt avatar May 15 '24 14:05 oguzkaganozt

@oguzkaganozt Oh, it's good point of view.

https://github.com/autowarefoundation/autoware/pull/4656 might be better approach. The PR calls both of vcs import and rosdep install only once. There is no side effect you mentioned.

However we need to the difference of the packages between --dependency-types build and --dependency-types exec. https://github.com/autowarefoundation/autoware/pull/4656#issuecomment-2082095239

rosdep install --dependency-types build
rosdep install --dependency-types exec

I'm investigating. Please wait a while.

youtalk avatar May 16 '24 13:05 youtalk

By generating the install package lists and copy the list only on the runtime step, this PR has no longer side effect of the runtime image size. https://github.com/autowarefoundation/autoware/pull/4712/commits/2d70277670ccd08630854819d6f34f1aff3bb98e

@oguzkaganozt Please review again. @xmfcx This PR have been succeeded to run both of the docke-build-and-push-main (no-cuda) and docke-build-and-push-main (cuda). Please review again. https://github.com/youtalk/autoware/actions/runs/9143787390/job/25140910449?pr=10 https://github.com/youtalk/autoware/actions/runs/9143787390/job/25140910551?pr=10

youtalk avatar May 19 '24 14:05 youtalk

Thanks @youtalk now it's looking really good

oguzkaganozt avatar May 20 '24 08:05 oguzkaganozt

Why not adding --mount=type=cache,target=/autoware/src for corresponding RUN statements?

This wouldn't be simpler?

doganulus avatar May 20 '24 12:05 doganulus

@doganulus It's not cache. We need to manage the source code change to rerun docker build. That's why I made the subsequent PR https://github.com/autowarefoundation/autoware/pull/4738.

youtalk avatar May 20 '24 21:05 youtalk

@youtalk If you run vcs import on the cache, it will update it. Yes, it is a cache.

It is better to use the container's own mechanisms than rolling out our custom caches whenever possible. I think it is possible. Something like that:

ARG AUTOWARE_VERSION=20240521
ENV AUTOWARE_VERSION=${AUTOWARE_VERSION}
...
RUN --mount=type=cache,target=/src/autoware,id=/src/autoware/${AUTOWARE_VERSION} \
    vcs import ...

doganulus avatar May 21 '24 06:05 doganulus