build
build copied to clipboard
Provide Bazel cache for TensorFlow builds
Providing a TensorFlow build cache could be very helpful to external developers, and lower the barrier to entry of contributing to TF.
Some ideas for this we've discussed before are:
- Offer Bazel RBE resources on behalf of SIG Build. This service is in alpha on GCP.
- Provide a read-only build cache in a GCP bucket.
- Provide
devel_cacheDocker images containing a build cache (these could be very large) - Provide code-and-cache volumes for the docker
develimages.
See also:
- https://github.com/tensorflow/tensorflow/issues/39560
- https://github.com/tensorflow/tensorflow/issues/4116
- https://github.com/tensorflow/addons/issues/1414
I'm looking into the feasibility of providing GCP resources (likely a long-term discussion) and devel_cache images as an evaluation (short-term, but no ETA).
I want to just add a reference that we could need to solve this also for let the user to adopt the new Github Codespace/Vscode Remote (https://github.com/tensorflow/addons/pull/1309) or for Gitpod (https://github.com/tensorflow/tensorflow/pull/38755).
It would be also nice as many SIGs builds using github Actions CI infra, specially the ones with c++/cuda custom ops, if we could find a way to recycle the bazel cache to speed-up CI builds. We have tried to use the bazel cache in Action cache for the CI (https://github.com/tensorflow/addons/issues/1655) but it is not working. If you see in this ticket we have external request on Github Action repo.
It would be also nice as many SIGs builds using github Actions CI infra
This would be excellent! For reference, some time ago there were some discussions about improving bazel cache support in GitHub actions at https://github.com/actions/cache/issues/109
@lgeiger Our ticket was https://github.com/actions/cache/issues/260. I don't know if they could be fused or not.
This will be orthogonal to the approved TF modularizzation RFC
We have started to explore internally to see if we can share our RBE cache. We will also look into if we can share a GCS cache.
@gunan Thanks I've intercepted this candidate dup https://github.com/tensorflow/tensorflow/issues/34719. Probably you can find some other ones on the TF repo.
Yes, this has been a long running problem for TF. And as TF gets bigger it will only get worse.
If this is going to take too much time can we find an intermediate goal like having support for python only PR? I think that it could be easier as an intermediate step. What do you think?
See what kind of bad hack I need to suggest https://github.com/tensorflow/tensorflow/pull/41701#discussion_r460587524
/cc @perfinion for @gunan post in https://groups.google.com/a/tensorflow.org/forum/m/#!topic/developers/1OJLv2ew7pA
I've tested your initial cache inside official TF Docker devel image but it has not the cross tools (d7/d8) like RBE and custom-ops Dockerfiles/images.
We have a threads in SIG-build Gitter channel
This would be great. It is very frustrating that I have to spin up docker images and compile C++ code overnight just to test a single line of code change to a Python function. The barrier to entry to contributing is extremely high. What I often end up doing is copying test_xyz.py as test.py, editing the tensorflow install in my virtual env and running test.py then crossing my fingers that CI passes.
Also when we are mounting the bazel cache inside the official Tensorflow Docker devel container we need to improve the stale cache handling.
Too often I see Deleting stale sandbox base is it related to https://github.com/bazelbuild/bazel/issues/8525? Seems that one was closed in Bazel 3.4.0.
In the meantime can we reply to https://groups.google.com/a/tensorflow.org/g/developers/c/1OJLv2ew7pA?
Is there a quick solution to iterate and modify the source code and run an example in the source dir without building and installing the wheel?
It seems that now we have a read only cache for TF IO but still not for Tensorflow contributors:
https://github.com/tensorflow/io/pull/1294
@bhack Given this situation, what is the best way to build TensorFlow while making small changes to the codebase? Can you please outline the procedure? TIA
With @angerson and @perfinion we are prototyping with https://github.com/tensorflow/tensorflow/pull/48421 (and https://github.com/tensorflow/build/pull/24) to continuously execute and monitor the external developer contribution experience/overhead (compile, lint and test).
/cc @theadactyl @nikitamaia
I think we could close this and monitor the build reproducibility and cache efficiency in https://github.com/tensorflow/build/pull/48
We have now a PR at https://github.com/tensorflow/tensorflow/pull/57630 if you want to support/review/imporve this baseline.