runtimes-common icon indicating copy to clipboard operation
runtimes-common copied to clipboard

FTL times out when building on Cloud Builder

Open myelin opened this issue 6 years ago • 12 comments

The following requirements.txt took three tries to build within the default Cloud Builder 10 minute timeout (getting a little further each time).

gunicorn uwsgi django flask google-cloud requests cryptography pillow

Running this locally, it takes 17 seconds with a hot pip cache, or 61 seconds after clearing the cache with rm -rf ~/.cache/pip

Test command: time bash -c 'rm -rf env; python3 -m virtualenv -p python3 env; source env/bin/activate; pip install -r requirements.txt'

myelin avatar Apr 24 '18 23:04 myelin

Can you describe the environment/builder-image you are using? In attempting the same requirements.txt I am seeing:

Step #2 - "build-image-gcr.io/ftl-node-test/repro_test-image:latest": INFO     Pushing image to Docker registry took 4 seconds
Step #2 - "build-image-gcr.io/ftl-node-test/repro_test-image:latest": INFO     Uploading final image took 4 seconds
Step #2 - "build-image-gcr.io/ftl-node-test/repro_test-image:latest": INFO     build process for FTL image took 392 seconds
Step #2 - "build-image-gcr.io/ftl-node-test/repro_test-image:latest": INFO     full build took 392 seconds

all deps are verified in the image. I used python2.7 for this test, retrying w/ python3

aaron-prindle avatar Apr 24 '18 23:04 aaron-prindle

running with w/python3.6

Step #2 - "build-image-gcr.io/ftl-node-test/repro_test-image:latest": INFO     Pushing image to Docker registry took 2 seconds
Step #2 - "build-image-gcr.io/ftl-node-test/repro_test-image:latest": INFO     Uploading final image took 2 seconds
Step #2 - "build-image-gcr.io/ftl-node-test/repro_test-image:latest": INFO     build process for FTL image took 275 seconds
Step #2 - "build-image-gcr.io/ftl-node-test/repro_test-image:latest": INFO     full build took 275 seconds

aaron-prindle avatar Apr 25 '18 00:04 aaron-prindle

Step #1 - "ftl": INFO pip_download_wheels took 281 seconds Step #1 - "ftl": INFO build process for FTL image took 517 seconds Step #1 - "ftl": INFO full build took 517 seconds

I'll email you the rest of the details.

myelin avatar Apr 25 '18 00:04 myelin

@myelin thanks for reporting this, the build times definitely rose per package when we updated to uploading each packages for phase 1.5 python. #607 should fix this by uploading the layers in parallel. I am now seeing times: Test requirements.txt https://gist.github.com/aaron-prindle/758f9ef4141669e19907b17da40f657d

Timing: https://gist.github.com/aaron-prindle/71cb60b604cd8c154205825f5b0094a0

import pieces: pip_download_wheels took 58 seconds uploading_all_package_layers took 13 seconds uploading_requirements.txt_pkg_lyr took 12 seconds full build took 122 seconds

I've made an issue with here how we might be able to drop the pip_download_wheels time: #610

aaron-prindle avatar Apr 25 '18 20:04 aaron-prindle

Another data point: Looks like if I did a a pip install with all the packages in our Dockerfile and ran that through Argo, the net installation took 4 mins 27 seconds.

rahulrv1980 avatar Apr 25 '18 21:04 rahulrv1980

can the download wheels be done in parallel?

rahulrv1980 avatar Apr 25 '18 21:04 rahulrv1980

can the download wheels be done in parallel?

this is the overarching problem of: phase 1 vs phase 2 requirements.txt. vs Pipfile

you can't download the wheels in parallel because each package pip fetches allows pip to find out the packages dependencies. this would be possible if the fully qualified list of deps was known from the beginning but it isn't currently. Something like downloading the known wheels (the specified ones) in parallel might be worth investigating though.

aaron-prindle avatar Apr 25 '18 21:04 aaron-prindle

it seems that it isn't possible as for common shared libs, a parallel download would introduce a race possible race condition on the package used: https://github.com/pypa/pip/issues/825#issuecomment-301968349

aaron-prindle avatar Apr 25 '18 21:04 aaron-prindle

it might actually be alright: https://github.com/pypa/pip/issues/825#issuecomment-302020952 https://github.com/pypa/pip/issues/825#issuecomment-319148864

I'm going to to a POC and investigate further

aaron-prindle avatar Apr 25 '18 21:04 aaron-prindle

Another data point - removing the google client libraries changes the time from about 4 mins to 1 minute.

rahulrv1980 avatar Apr 25 '18 22:04 rahulrv1980

with POC: https://github.com/GoogleCloudPlatform/runtimes-common/pull/614

Test requirements.txt https://gist.github.com/aaron-prindle/758f9ef4141669e19907b17da40f657d

Timing: https://gist.github.com/aaron-prindle/0764d32a928cc5002f6a6eb46c4c36f0

important pieces: pip_download_wheels took 68 seconds

the timing is the same w/ or w/o the parallel so I believe that pip wheel already does the possible parallel optimization

aaron-prindle avatar Apr 25 '18 22:04 aaron-prindle

@rahulrv1980 yes, it seems that downloading the wheels for the google client libraries alone takes ~1 minute

aaron-prindle avatar Apr 25 '18 22:04 aaron-prindle