lorax
lorax copied to clipboard
sync.sh script fails for some models (Llama-2-70b being one of them)
trafficstars
System Info
predibase
Information
- [ ] Docker
- [ ] The CLI directly
Tasks
- [ ] An officially supported command
- [ ] My own modifications
Reproduction
- Try to use the sync.sh script to download llama-2-70b
An error occurred (NoSuchBucket) when calling the ListObjectsV2 operation: The specified bucket does not exist
No files found in the cache s3://huggingface-model-cache/models--meta-llama--Llama-2-70b-hf/. Downloading from HuggingFace Hub.
Received arguments: --download-only
2024-02-12T19:49:56.248834Z INFO lorax_launcher: Args { model_id: "meta-llama/Llama-2-70b-hf", adapter_id: "", source: "hub", adapter_source: "hub", revision: None, validation_workers: 2, sharded: None, num_shard: None, quantize: None, compile: false, dtype: None, trust_remote_code: false, max_concurrent_requests: 128, max_best_of: 2, max_stop_sequences: 4, max_input_length: 1024, max_total_tokens: 2048, waiting_served_ratio: 1.2, max_batch_prefill_tokens: 4096, max_batch_total_tokens: None, max_waiting_tokens: 20, max_active_adapters: 128, adapter_cycle_time_s: 2, hostname: "llm-deployment-llama-2-70b-78d75cc765-6mn8x", port: 80, shard_uds_path: "/tmp/lorax-server", master_addr: "localhost", master_port: 29500, huggingface_hub_cache: Some("/data"), weights_cache_override: None, disable_custom_kernels: false, cuda_memory_fraction: 1.0, json_output: false, otlp_endpoint: None, cors_allow_origin: [], watermark_gamma: None, watermark_delta: None, ngrok: false, ngrok_authtoken: None, ngrok_edge: None, env: false, download_only: true }
2024-02-12T19:49:56.248969Z INFO download: lorax_launcher: Starting download process.
2024-02-12T19:49:58.651649Z ERROR download: lorax_launcher: Download encountered an error: Traceback (most recent call last):
Error: DownloadError
File "/opt/conda/lib/python3.10/site-packages/torch/cuda/__init__.py", line 311, in _lazy_init
queued_call()
File "/opt/conda/lib/python3.10/site-packages/torch/cuda/__init__.py", line 180, in _check_capability
capability = get_device_capability(d)
File "/opt/conda/lib/python3.10/site-packages/torch/cuda/__init__.py", line 435, in get_device_capability
prop = get_device_properties(device)
File "/opt/conda/lib/python3.10/site-packages/torch/cuda/__init__.py", line 453, in get_device_properties
return _get_device_properties(device) # type: ignore[name-defined]
RuntimeError: device >= 0 && device < num_gpus INTERNAL ASSERT FAILED at "/opt/conda/conda-bld/pytorch_1702400366987/work/aten/src/ATen/cuda/CUDAContext.cpp":50, please report a bug to PyTorch. device=1, num_gpus=
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
Expected behavior
- It shouldn't fail.