torchtune Periodic testing of external APIs

We are adding several dependencies in our codebase for external services like huggingface_hub, wandb, etc.

As mentioned by @laurencer , it's important for us to keep an eye on these services to ensure they are up and running properly. Filing this issue for us to come back and consider adding in a form of testing that periodicially pings these APIs to ensure a 200 OK response.

Jan 30 '24 15:01 joecummings

Torchvision has a CI job that runs daily and tries to download the datasets that torchvision exposes. If there's a failure, the job would open an issue like these ones.

Link to job: https://github.com/pytorch/vision/blob/main/.github/workflows/tests-schedule.yml (there are links to the underlying files from there).

Hope this helps

Jan 30 '24 15:01 NicolasHug

Torchvision has a CI job that runs daily and tries to download the datasets that torchvision exposes. If there's a failure, the job would open an issue like these ones.

Link to job: https://github.com/pytorch/vision/blob/main/.github/workflows/tests-schedule.yml (there are links to the underlying files from there).

Hope this helps

This is exactly what we want, I believe. Thanks!

I might be overthinking this issue, but is there any idea about whether a semi-random daily schedule vs. truly periodic schedule is more helpful?

Jan 30 '24 16:01 joecummings

I haven't really thought about it, but I don't see any obvious reason for making it random. When there are failures, debugging can become a pain pretty quickly, and any source of variability (like a semi-random schedule) could potentially add confusion.

Jan 30 '24 16:01 NicolasHug

With this change in https://github.com/pytorch-labs/torchtune/pull/289, we won't be testing HF dataset download in unit test. But we do want to do a nightly run of testing HF load_dataset API.

Feb 01 '24 18:02 gokulavasan

Tracking this in #691 instead.

Apr 21 '24 16:04 kartikayk