Zhanghao Wu
Zhanghao Wu
Tested (run the relevant ones): - [ ] Code formatting: install pre-commit (auto-check on commit) or `bash format.sh` - [ ] Any manual or new tests for this PR (please...
A user encountered an issue when a GKE has a node pool with instance type that does not appear in the GCP catalog, with the `autoscaler: gke` set in config....
[https://www.notion.so/Release-Process-2c4d3d4bb77480b9be52ec835111b2e6](https://www.notion.so/Release-Process-2c4d3d4bb77480b9be52ec835111b2e6)
> We mainly want to be able to specify something like a "job group" in a single YAML file and launch/stop it with a single command line. Each job in...
This is an example to enable large-scale parallel model evaluation with SkyPilot + Promptfoo   ### Why SkyPilot? **SkyPilot automates the complex infrastructure setup** needed for large-scale model evaluation:...
## Summary - format CPU values in resource strings to preserve fractional requests - add a unit test ensuring fractional Kubernetes CPU requests display correctly ## Testing - pytest tests/unit_tests/test_sky/utils/test_cli_utils.py...
When uploading files to the API server, the httpx client was shared across multiple parallel upload threads, causing SSL_ALERT_BAD_RECORD_MAC errors due to corrupted SSL state from concurrent connection reuse. This...
Changing the context name in `~/.kube/config` causes using volumes fail.
``` file_mounts: /buckets/my-models: name: skypilot-agent-models store: s3 mode: MOUNT_CACHED ``` Trying the above on a GKE cluster, and getting the following error with the `kubectl logs` ``` Installing missing packages...
Move our examples that has long setup time to use `uv` to reduce cold start time.