lorax sync.sh script fails for some models (Llama-2-70b being one of them)

sync.sh script fails for some models (Llama-2-70b being one of them)

Open noah-yoshida opened this issue 1 year ago • 0 comments

trafficstars

System Info

predibase

Information

[ ] Docker
[ ] The CLI directly

Tasks

[ ] An officially supported command
[ ] My own modifications

Reproduction

Try to use the sync.sh script to download llama-2-70b

An error occurred (NoSuchBucket) when calling the ListObjectsV2 operation: The specified bucket does not exist
No files found in the cache s3://huggingface-model-cache/models--meta-llama--Llama-2-70b-hf/. Downloading from HuggingFace Hub.
Received arguments: --download-only
2024-02-12T19:49:56.248834Z  INFO lorax_launcher: Args { model_id: "meta-llama/Llama-2-70b-hf", adapter_id: "", source: "hub", adapter_source: "hub", revision: None, validation_workers: 2, sharded: None, num_shard: None, quantize: None, compile: false, dtype: None, trust_remote_code: false, max_concurrent_requests: 128, max_best_of: 2, max_stop_sequences: 4, max_input_length: 1024, max_total_tokens: 2048, waiting_served_ratio: 1.2, max_batch_prefill_tokens: 4096, max_batch_total_tokens: None, max_waiting_tokens: 20, max_active_adapters: 128, adapter_cycle_time_s: 2, hostname: "llm-deployment-llama-2-70b-78d75cc765-6mn8x", port: 80, shard_uds_path: "/tmp/lorax-server", master_addr: "localhost", master_port: 29500, huggingface_hub_cache: Some("/data"), weights_cache_override: None, disable_custom_kernels: false, cuda_memory_fraction: 1.0, json_output: false, otlp_endpoint: None, cors_allow_origin: [], watermark_gamma: None, watermark_delta: None, ngrok: false, ngrok_authtoken: None, ngrok_edge: None, env: false, download_only: true }
2024-02-12T19:49:56.248969Z  INFO download: lorax_launcher: Starting download process.
2024-02-12T19:49:58.651649Z ERROR download: lorax_launcher: Download encountered an error: Traceback (most recent call last):

Error: DownloadError
  File "/opt/conda/lib/python3.10/site-packages/torch/cuda/__init__.py", line 311, in _lazy_init
    queued_call()

  File "/opt/conda/lib/python3.10/site-packages/torch/cuda/__init__.py", line 180, in _check_capability
    capability = get_device_capability(d)

  File "/opt/conda/lib/python3.10/site-packages/torch/cuda/__init__.py", line 435, in get_device_capability
    prop = get_device_properties(device)

  File "/opt/conda/lib/python3.10/site-packages/torch/cuda/__init__.py", line 453, in get_device_properties
    return _get_device_properties(device)  # type: ignore[name-defined]

RuntimeError: device >= 0 && device < num_gpus INTERNAL ASSERT FAILED at "/opt/conda/conda-bld/pytorch_1702400366987/work/aten/src/ATen/cuda/CUDAContext.cpp":50, please report a bug to PyTorch. device=1, num_gpus=


The above exception was the direct cause of the following exception:


Traceback (most recent call last):

Expected behavior

It shouldn't fail.

Feb 12 '24 20:02 noah-yoshida

lorax lorax copied to clipboard

sync.sh script fails for some models (Llama-2-70b being one of them)

System Info

Information

Tasks

Reproduction

Expected behavior

lorax
lorax copied to clipboard