tinygrad
tinygrad copied to clipboard
ci: cache downloads
This should fix #1180 by using cache in actions/setup-python@v4
to store the downloaded files and actions/cache@v3
to skip installation by caching the python location if setup.py
is not changed. actions/cache@v3
stores efficientnet
model as well by caching the test/models/efficientnet
.
This does not cache files that are downloaded when running the tests, for example the efficientnet test downloads the model but does not cache it currently.
Will require tests to actually download files instead of just fetching them I think.
This does not cache files that are downloaded when running the tests, for example the efficientnet test downloads the model but does not cache it currently.
Will require tests to actually download files instead of just fetching them I think.
should be fixed now!
What is being saved? Runner time? Bandwidth? Ingress?
Based on the tqdm timings, the downloads are near instant because the download speed is so high.
Creating the cache, however, adds 1 min (CLANG ubuntu-latest job) or 2 min (CLANG windows job). Is the cache action compressing stuff or so?
What is being saved? Runner time? Bandwidth? Ingress?
Based on the tqdm timings, the downloads are near instant because the download speed is so high.
Creating the cache, however, adds 1 min (CLANG ubuntu-latest job) or 2 min (CLANG windows job). Is the cache action compressing stuff or so?
Some bandwidth and runner time since it'll cache both python deps and download of models tho marginal since download speed is high as you said. Creating the cache is only done when there's changes to the setup.py
. In this case, it takes time to create it because there's no existing cache. Cache would not need to be updated frequently once its there. Would suggest having the discussion in the related issue since this PR is created to address the issue that was raised and I did not raise the issue.
@geohot would be great if you could re-trigger a workflow run on this PR to compare the difference
Any evidence this is faster?
I think it is by a margin. Would it be possible for you to trigger a re-run on this PR without making changes to the action yml? We could compare it to the last run which should have it cached already. Earlier on I think it did went down to 8 minutes on some runs when making changes
Triggered
@geohot fixed some issues with setup-python cache since its conflicting with the actions/cache
. based on the latest run i do notice that it it speeds up significantly on installing the deps and downloads of torch models. some parts thats slow are test related so i'm not sure whether its actually helpful. also other model downloads (e.g. vit_models) could potentially be faster. not sure if that's something worth working on. let me know what you think. would be great if someone can provide me the paths for where the various models exists as i do not have a gpu-enabled pc on my end.
@unofficialquant
The downloaded models are stored in a tempfile based on a hash of the url or directly read (fetch
and fetch_as_file
).
https://github.com/tinygrad/tinygrad/blob/e2f6b09ffde3de184421c1dff164be1edee95cca/extra/utils.py#L16
If those are cached, you probably need a .exists()
check to avoid re-downloading.
What about caching all .pth files in the tempdir using a glob pattern? For that to work, the filenames in those fetch functions would have to be changed to retain the extension.
This cache could potentially even be shared between jobs.
@unofficialquant
The downloaded models are stored in a tempfile based on a hash of the url or directly read (
fetch
andfetch_as_file
).https://github.com/tinygrad/tinygrad/blob/e2f6b09ffde3de184421c1dff164be1edee95cca/extra/utils.py#L16
If those are cached, you probably need a
.exists()
check to avoid re-downloading.What about caching all .pth files in the tempdir using a glob pattern? For that to work, the filenames in those fetch functions would have to be changed to retain the extension.
This cache could potentially even be shared between jobs.
The issue with tempfile
is it creates the directory at /var/folders
. I've tested earlier on and GitHub action is not able to cache anything from that directory. Would have to store all the models at some other directory in order to cache them.
The issue with
tempfile
is it creates the directory at/var/folders
. I've tested earlier on and GitHub action is not able to cache anything from that directory. Would have to store all the models at some other directory in order to cache them.
To allow caching of the weights, they would have to be stored somewhere else. Either always (in a folder ignored by git), or only when CI jobs are ran (e.g. if getenv('CI', False)
and then CI=1 python -m pytest ...
. Two locations seem natural: /weights
or /cache
. Both are already ignored by git. By not including the job name in the key this cache shared between jobs, if the required weights largely overlap.
Generally speaking, I think caching the weight downloads is a great idea, however, caching of the Python virtual environments can be problematical. setup.py
does not specify the versions of the dependencies, so now the latest are used. By caching the environment in the jobs, packages will get stuck on the versions in use when the cache was created, which means local environments will diverge from the CI environments. Eventually, an API will change somewhere and weird job failures will occur.
NOTE: I have no standing here, I am just giving my opinion.