pygraphistry
pygraphistry copied to clipboard
CI: HuggingFace cache + offline runs for feature_utils tests
Summary
Feature utils tests hit HF rate limits (429) during model downloads. Add a deterministic HF cache + offline mode in CI.
Proposal
- Add a warmup + cache job before matrix runs:
- Set
HF_HOME(orHF_HUB_CACHE) to${{ runner.temp }}/hf-cache. - Use
actions/cachewith keyhf-cache-${{ runner.os }}-v1(stable per-OS; no per-Python split). - Warm once using
HF_TOKENsecret for:sentence-transformers/average_word_embeddings_komninossentence-transformers/paraphrase-MiniLM-L6-v2sentence-transformers/paraphrase-albert-small-v2
- After restore/warm, set
HF_HUB_OFFLINE=1for all test steps.
- Set
- All matrix jobs restore the same cache; bump key version only when model set changes.
- Optional: pre-bake the cache into GHCR or a release asset for zero-network runs.
- Optional: lower parallelism for HF-heavy jobs if needed.
Rationale
- Unblocks
graphistry/tests/test_feature_utils.pyfailures (429s/missing HF files). - Makes CI deterministic and faster by avoiding external throttling.
Notes
- Models are ~250MB total, fine for cache/asset.
- If
HF_TOKENis missing, skip warm and rely on existing cache or skip HF tests.