telemetry-airflow icon indicating copy to clipboard operation
telemetry-airflow copied to clipboard

fix requirements installed by dataproc_init.sh

Open willkg opened this issue 4 years ago • 3 comments

dataproc_init.sh runs pip 20.3.1 with the new resolver and the requirements that it's trying to install don't gel well with one another causing the resolver to take infinite time to figure out a set of packages that meet the requirements.

https://github.com/mozilla/telemetry-airflow/issues/1198 covers changing dataproc_init.sh to use pip<20.3.0 which puts us back on the old resolver.

This issue covers fixing our requirements so that they don't cause the infinite-time-to-resolve problem.

willkg avatar Dec 08 '20 15:12 willkg

The pip issue suggests switching to a requirements.in file and pip-tools to compile it pushing the dependency resolution step out of installation time. I did that with socorro and tecken a while back. We could do that here.

I don't know where the requirements file is coming from. It's this line in dataproc_init.sh:

https://github.com/mozilla/telemetry-airflow/blob/60f7dd1236193da4cfb73ab71210fb129b9638df/dataproc_bootstrap/dataproc_init.sh#L26-L27

Maybe that requirements file is already compiled in which case maybe all we need to do is update the requirements.in and recompile it?

willkg avatar Dec 08 '20 15:12 willkg

I don't know where the requirements file is coming from. It's this line in dataproc_init.sh

It looks like it's defined within telemetry-airflow/dataproc_bootstrap and the contents of that folder are rsync'd to GCS.

Looks like it is not currently compiled, but we could do so within telemetry-airflow and commit the results.

jklukas avatar Dec 08 '20 15:12 jklukas

The pip issue suggests switching to a requirements.in file and pip-tools to compile it pushing the dependency resolution step out of installation time. I did that with socorro and tecken a while back. We could do that here.

I strongly agree with using pip-tools for compiling requirements. We use it for the airflow container to pin dependencies (https://github.com/mozilla/telemetry-airflow/pull/1008)

acmiyaguchi avatar Dec 08 '20 19:12 acmiyaguchi