Add nightly testing for extra_large tests
We have some testing that is very resource intensive to run marked with an "extra large" annotation. This includes our fenced docstring tests, and testing for most of our presets.
We should try to get automated coverage for these to avoid things like https://github.com/keras-team/keras-nlp/issues/782. Though these tests are too slow to hold up every PR with.
One good solution would be run this testing on GCP every night, as well as build a way to manually invoke the tests (e.g. before cutting a new release).
Sadly this is probably not a good contributions welcome issue, as it will require some provisioning of machines for our test infra.
@chenmoneygithub would this be easy to set up with our current GCP testing setup?
Unfortunately it's annoyingly complex, I don't think we can get this infra set up within 1-2 weeks. IMO it will have the components below:
- Testing trigger: we can use ml-accelerator-testing offerings.
- Computing resources: we can use GKE or just GCE, would need to dig more into it.
- Reports storage: no concrete idea, maybe BigStore or just bucket?
- Dashboard: vaguely remember GCP has such support, but configs are required.
- Notification (email): GCP has email APIs connecting to spanner, unsure if it has the same thing for other storage.
- Feedback to github: this is probably unsupported but pretty important IMO, having different places to track would be hard and prone to getting stale.
So overall we need to write a design for this infra, and it can be used on KerasCV and our projects in future as well. Actually this could be an interesting infra can be used by other repos.