pygraphistry icon indicating copy to clipboard operation
pygraphistry copied to clipboard

CI: HuggingFace cache + offline runs for feature_utils tests

Open lmeyerov opened this issue 1 month ago • 0 comments

Summary

Feature utils tests hit HF rate limits (429) during model downloads. Add a deterministic HF cache + offline mode in CI.

Proposal

  • Add a warmup + cache job before matrix runs:
    • Set HF_HOME (or HF_HUB_CACHE) to ${{ runner.temp }}/hf-cache.
    • Use actions/cache with key hf-cache-${{ runner.os }}-v1 (stable per-OS; no per-Python split).
    • Warm once using HF_TOKEN secret for:
      • sentence-transformers/average_word_embeddings_komninos
      • sentence-transformers/paraphrase-MiniLM-L6-v2
      • sentence-transformers/paraphrase-albert-small-v2
    • After restore/warm, set HF_HUB_OFFLINE=1 for all test steps.
  • All matrix jobs restore the same cache; bump key version only when model set changes.
  • Optional: pre-bake the cache into GHCR or a release asset for zero-network runs.
  • Optional: lower parallelism for HF-heavy jobs if needed.

Rationale

  • Unblocks graphistry/tests/test_feature_utils.py failures (429s/missing HF files).
  • Makes CI deterministic and faster by avoiding external throttling.

Notes

  • Models are ~250MB total, fine for cache/asset.
  • If HF_TOKEN is missing, skip warm and rely on existing cache or skip HF tests.

lmeyerov avatar Nov 29 '25 21:11 lmeyerov