telemetry-airflow
telemetry-airflow copied to clipboard
fix requirements installed by dataproc_init.sh
dataproc_init.sh runs pip 20.3.1 with the new resolver and the requirements that it's trying to install don't gel well with one another causing the resolver to take infinite time to figure out a set of packages that meet the requirements.
https://github.com/mozilla/telemetry-airflow/issues/1198 covers changing dataproc_init.sh to use pip<20.3.0 which puts us back on the old resolver.
This issue covers fixing our requirements so that they don't cause the infinite-time-to-resolve problem.
The pip issue suggests switching to a requirements.in file and pip-tools to compile it pushing the dependency resolution step out of installation time. I did that with socorro and tecken a while back. We could do that here.
I don't know where the requirements file is coming from. It's this line in dataproc_init.sh:
https://github.com/mozilla/telemetry-airflow/blob/60f7dd1236193da4cfb73ab71210fb129b9638df/dataproc_bootstrap/dataproc_init.sh#L26-L27
Maybe that requirements file is already compiled in which case maybe all we need to do is update the requirements.in and recompile it?
I don't know where the requirements file is coming from. It's this line in dataproc_init.sh
It looks like it's defined within telemetry-airflow/dataproc_bootstrap and the contents of that folder are rsync'd to GCS.
Looks like it is not currently compiled, but we could do so within telemetry-airflow and commit the results.
The pip issue suggests switching to a requirements.in file and pip-tools to compile it pushing the dependency resolution step out of installation time. I did that with socorro and tecken a while back. We could do that here.
I strongly agree with using pip-tools for compiling requirements. We use it for the airflow container to pin dependencies (https://github.com/mozilla/telemetry-airflow/pull/1008)