OSError: /models/deepseek-r1 does not appear to have a file named configuration_deepseek.py. Checkout 'https://huggingface.co//models/deepseek-r1/tree/None' for available files.
🐛 Describe the bug
INFO 03-09 00:16:28 api_server.py:913] args: Namespace(subparser='serve', model_tag='/models/deepseek-r1', config='', host=None, port=8000, uvicorn_log_level='warning', allow_credentials=False, allowed_origins=['*'], allowed_methods=['*'], allowed_headers=['*'], api_key=None, lora_modules=None, prompt_adapters=None, chat_template=None, chat_template_content_format='auto', response_role='assistant', ssl_keyfile=None, ssl_certfile=None, ssl_ca_certs=None, ssl_cert_reqs=0, root_path=None, middleware=[], return_tokens_as_token_ids=False, disable_frontend_multiprocessing=False, enable_request_id_headers=False, enable_auto_tool_choice=False, enable_reasoning=False, reasoning_parser=None, tool_call_parser=None, tool_parser_plugin='', model='/models/deepseek-r1', task='auto', tokenizer=None, skip_tokenizer_init=False, revision=None, code_revision=None, tokenizer_revision=None, tokenizer_mode='auto', trust_remote_code=True, allowed_local_media_path=None, download_dir=None, load_format='auto', config_format=<ConfigFormat.AUTO: 'auto'>, dtype='auto', kv_cache_dtype='auto', max_model_len=None, guided_decoding_backend='xgrammar', logits_processor_pattern=None, model_impl='auto', distributed_executor_backend='ray', pipeline_parallel_size=1, tensor_parallel_size=16, max_parallel_loading_workers=None, ray_workers_use_nsight=False, block_size=None, enable_prefix_caching=None, disable_sliding_window=False, use_v2_block_manager=True, num_lookahead_slots=0, seed=0, swap_space=4, cpu_offload_gb=0, gpu_memory_utilization=0.9, num_gpu_blocks_override=None, max_num_batched_tokens=None, max_num_partial_prefills=1, max_long_partial_prefills=1, long_prefill_token_threshold=0, max_num_seqs=None, max_logprobs=20, disable_log_stats=False, quantization=None, rope_scaling=None, rope_theta=None, hf_overrides=None, enforce_eager=False, max_seq_len_to_capture=8192, disable_custom_all_reduce=False, tokenizer_pool_size=0, tokenizer_pool_type='ray', tokenizer_pool_extra_config=None, limit_mm_per_prompt=None, mm_processor_kwargs=None, disable_mm_preprocessor_cache=False, enable_lora=False, enable_lora_bias=False, max_loras=1, max_lora_rank=16, lora_extra_vocab_size=256, lora_dtype='auto', long_lora_scaling_factors=None, max_cpu_loras=None, fully_sharded_loras=False, enable_prompt_adapter=False, max_prompt_adapters=1, max_prompt_adapter_token=0, device='auto', num_scheduler_steps=1, multi_step_stream_outputs=True, scheduler_delay_factor=0.0, enable_chunked_prefill=None, speculative_model=None, speculative_model_quantization=None, num_speculative_tokens=None, speculative_disable_mqa_scorer=False, speculative_draft_tensor_parallel_size=None, speculative_max_model_len=None, speculative_disable_by_batch_size=None, ngram_prompt_lookup_max=None, ngram_prompt_lookup_min=None, spec_decoding_acceptance_method='rejection_sampler', typical_acceptance_sampler_posterior_threshold=None, typical_acceptance_sampler_posterior_alpha=None, disable_logprobs_during_spec_decoding=None, model_loader_extra_config=None, ignore_patterns=[], preemption_mode=None, served_model_name=['deepseek-r1-671b'], qlora_adapter_name_or_path=None, otlp_traces_endpoint=None, collect_detailed_traces=None, disable_async_output_proc=False, scheduling_policy='fcfs', scheduler_cls='vllm.core.scheduler.Scheduler', override_neuron_config=None, override_pooler_config=None, compilation_config=None, kv_transfer_config=None, worker_cls='auto', generation_config=None, override_generation_config=None, enable_sleep_mode=False, calculate_kv_scales=False, additional_config=None, disable_log_requests=False, max_log_len=None, disable_fastapi_docs=False, enable_prompt_tokens_details=False, dispatch_function=<function ServeSubcommand.cmd at 0x7f0b77561d00>)
INFO 03-09 00:16:28 api_server.py:209] Started engine process with PID 734
Could not locate the configuration_deepseek.py inside /models/deepseek-r1.
Traceback (most recent call last):
File "/usr/local/bin/vllm", line 10, in <module>
sys.exit(main())
^^^^^^
File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/cli/main.py", line 73, in main
args.dispatch_function(args)
File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/cli/serve.py", line 34, in cmd
uvloop.run(run_server(args))
File "/usr/local/lib/python3.12/dist-packages/uvloop/__init__.py", line 109, in run
return __asyncio.run(
^^^^^^^^^^^^^^
File "/usr/lib/python3.12/asyncio/runners.py", line 195, in run
return runner.run(main)
^^^^^^^^^^^^^^^^
File "/usr/lib/python3.12/asyncio/runners.py", line 118, in run
return self._loop.run_until_complete(task)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "uvloop/loop.pyx", line 1518, in uvloop.loop.Loop.run_until_complete
File "/usr/local/lib/python3.12/dist-packages/uvloop/__init__.py", line 61, in wrapper
return await main
^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 947, in run_server
async with build_async_engine_client(args) as engine_client:
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.12/contextlib.py", line 210, in __aenter__
return await anext(self.gen)
^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 139, in build_async_engine_client
async with build_async_engine_client_from_engine_args(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.12/contextlib.py", line 210, in __aenter__
return await anext(self.gen)
^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 220, in build_async_engine_client_from_engine_args
engine_config = engine_args.create_engine_config()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/vllm/engine/arg_utils.py", line 1127, in create_engine_config
model_config = self.create_model_config()
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/vllm/engine/arg_utils.py", line 1047, in create_model_config
return ModelConfig(
^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/vllm/config.py", line 304, in __init__
hf_config = get_config(self.model, trust_remote_code, revision,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/vllm/transformers_utils/config.py", line 287, in get_config
config = AutoConfig.from_pretrained(
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/transformers/models/auto/configuration_auto.py", line 1084, in from_pretrained
config_class = get_class_from_dynamic_module(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/transformers/dynamic_module_utils.py", line 541, in get_class_from_dynamic_module
final_module = get_cached_module_file(
^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/transformers/dynamic_module_utils.py", line 345, in get_cached_module_file
resolved_module_file = cached_file(
^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/transformers/utils/hub.py", line 313, in cached_file
raise EnvironmentError(
OSError: /models/deepseek-r1 does not appear to have a file named configuration_deepseek.py. Checkout 'https://huggingface.co//models/deepseek-r1/tree/None' for available files.
INFO 03-09 00:16:31 __init__.py:207] Automatically detected platform cuda.
Could not locate the configuration_deepseek.py inside /models/deepseek-r1.
ERROR 03-09 00:16:32 engine.py:400] /models/deepseek-r1 does not appear to have a file named configuration_deepseek.py. Checkout 'https://huggingface.co//models/deepseek-r1/tree/None' for available files.
ERROR 03-09 00:16:32 engine.py:400] Traceback (most recent call last):
ERROR 03-09 00:16:32 engine.py:400] File "/usr/local/lib/python3.12/dist-packages/vllm/engine/multiprocessing/engine.py", line 391, in run_mp_engine
ERROR 03-09 00:16:32 engine.py:400] engine = MQLLMEngine.from_engine_args(engine_args=engine_args,
ERROR 03-09 00:16:32 engine.py:400] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 03-09 00:16:32 engine.py:400] File "/usr/local/lib/python3.12/dist-packages/vllm/engine/multiprocessing/engine.py", line 119, in from_engine_args
ERROR 03-09 00:16:32 engine.py:400] engine_config = engine_args.create_engine_config(usage_context)
ERROR 03-09 00:16:32 engine.py:400] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 03-09 00:16:32 engine.py:400] File "/usr/local/lib/python3.12/dist-packages/vllm/engine/arg_utils.py", line 1127, in create_engine_config
ERROR 03-09 00:16:32 engine.py:400] model_config = self.create_model_config()
ERROR 03-09 00:16:32 engine.py:400] ^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 03-09 00:16:32 engine.py:400] File "/usr/local/lib/python3.12/dist-packages/vllm/engine/arg_utils.py", line 1047, in create_model_config
ERROR 03-09 00:16:32 engine.py:400] return ModelConfig(
ERROR 03-09 00:16:32 engine.py:400] ^^^^^^^^^^^^
ERROR 03-09 00:16:32 engine.py:400] File "/usr/local/lib/python3.12/dist-packages/vllm/config.py", line 304, in __init__
ERROR 03-09 00:16:32 engine.py:400] hf_config = get_config(self.model, trust_remote_code, revision,
ERROR 03-09 00:16:32 engine.py:400] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 03-09 00:16:32 engine.py:400] File "/usr/local/lib/python3.12/dist-packages/vllm/transformers_utils/config.py", line 287, in get_config
ERROR 03-09 00:16:32 engine.py:400] config = AutoConfig.from_pretrained(
ERROR 03-09 00:16:32 engine.py:400] ^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 03-09 00:16:32 engine.py:400] File "/usr/local/lib/python3.12/dist-packages/transformers/models/auto/configuration_auto.py", line 1084, in from_pretrained
ERROR 03-09 00:16:32 engine.py:400] config_class = get_class_from_dynamic_module(
ERROR 03-09 00:16:32 engine.py:400] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 03-09 00:16:32 engine.py:400] File "/usr/local/lib/python3.12/dist-packages/transformers/dynamic_module_utils.py", line 541, in get_class_from_dynamic_module
ERROR 03-09 00:16:32 engine.py:400] final_module = get_cached_module_file(
ERROR 03-09 00:16:32 engine.py:400] ^^^^^^^^^^^^^^^^^^^^^^^
ERROR 03-09 00:16:32 engine.py:400] File "/usr/local/lib/python3.12/dist-packages/transformers/dynamic_module_utils.py", line 345, in get_cached_module_file
ERROR 03-09 00:16:32 engine.py:400] resolved_module_file = cached_file(
ERROR 03-09 00:16:32 engine.py:400] ^^^^^^^^^^^^
ERROR 03-09 00:16:32 engine.py:400] File "/usr/local/lib/python3.12/dist-packages/transformers/utils/hub.py", line 313, in cached_file
ERROR 03-09 00:16:32 engine.py:400] raise EnvironmentError(
ERROR 03-09 00:16:32 engine.py:400] OSError: /models/deepseek-r1 does not appear to have a file named configuration_deepseek.py. Checkout 'https://huggingface.co//models/deepseek-r1/tree/None' for available files.
Process SpawnProcess-1:
Traceback (most recent call last):
File "/usr/lib/python3.12/multiprocessing/process.py", line 314, in _bootstrap
self.run()
File "/usr/lib/python3.12/multiprocessing/process.py", line 108, in run
self._target(*self._args, **self._kwargs)
File "/usr/local/lib/python3.12/dist-packages/vllm/engine/multiprocessing/engine.py", line 402, in run_mp_engine
raise e
File "/usr/local/lib/python3.12/dist-packages/vllm/engine/multiprocessing/engine.py", line 391, in run_mp_engine
engine = MQLLMEngine.from_engine_args(engine_args=engine_args,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/vllm/engine/multiprocessing/engine.py", line 119, in from_engine_args
engine_config = engine_args.create_engine_config(usage_context)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/vllm/engine/arg_utils.py", line 1127, in create_engine_config
model_config = self.create_model_config()
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/vllm/engine/arg_utils.py", line 1047, in create_model_config
return ModelConfig(
^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/vllm/config.py", line 304, in __init__
hf_config = get_config(self.model, trust_remote_code, revision,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/vllm/transformers_utils/config.py", line 287, in get_config
config = AutoConfig.from_pretrained(
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/transformers/models/auto/configuration_auto.py", line 1084, in from_pretrained
config_class = get_class_from_dynamic_module(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/transformers/dynamic_module_utils.py", line 541, in get_class_from_dynamic_module
final_module = get_cached_module_file(
^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/transformers/dynamic_module_utils.py", line 345, in get_cached_module_file
resolved_module_file = cached_file(
^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/transformers/utils/hub.py", line 313, in cached_file
raise EnvironmentError(
OSError: /models/deepseek-r1 does not appear to have a file named configuration_deepseek.py. Checkout 'https://huggingface.co//models/deepseek-r1/tree/None' for available files.
Steps to Reproduce
apiVersion: orchestration.aibrix.ai/v1alpha1
kind: RayClusterFleet
metadata:
labels:
app.kubernetes.io/name: aibrix
model.aibrix.ai/name: deepseek-r1-671b
model.aibrix.ai/port: "8000"
name: deepseek-r1-671b
spec:
replicas: 1
selector:
matchLabels:
model.aibrix.ai/name: deepseek-r1-671b
model.aibrix.ai/port: "8000"
strategy:
rollingUpdate:
maxSurge: 25%
maxUnavailable: 25%
type: RollingUpdate
template:
metadata:
labels:
model.aibrix.ai/name: deepseek-r1-671b
model.aibrix.ai/port: "8000"
annotations:
ray.io/overwrite-container-cmd: "true"
spec:
rayVersion: '2.40.0'
headGroupSpec:
rayStartParams:
dashboard-host: '0.0.0.0'
block: 'false'
template:
metadata:
labels:
model.aibrix.ai/name: deepseek-r1-671b
model.aibrix.ai/port: "8000"
annotations:
k8s.volcengine.com/pod-networks: |
[
{
"cniConf":{
"name":"rdma"
}
},
{
"cniConf":{
"name":"rdma"
}
},
{
"cniConf":{
"name":"rdma"
}
},
{
"cniConf":{
"name":"rdma"
}
},
{
"cniConf":{
"name":"rdma"
}
},
{
"cniConf":{
"name":"rdma"
}
},
{
"cniConf":{
"name":"rdma"
}
},
{
"cniConf":{
"name":"rdma"
}
}
]
spec:
initContainers:
- name: init-model
image: aibrix-container-registry-cn-beijing.cr.volces.com/aibrix/runtime:v0.2.0
command:
- aibrix_download
- --model-uri
- tos://aibrix-artifact-testing/models/deepseek-r1/
- --local-dir
- /models/
env:
- name: DOWNLOADER_MODEL_NAME
value: deepseek-r1
- name: DOWNLOADER_NUM_THREADS
value: "16"
- name: DOWNLOADER_ALLOW_FILE_SUFFIX
value: json, safetensors
- name: TOS_ACCESS_KEY
valueFrom:
secretKeyRef:
name: tos-credential
key: TOS_ACCESS_KEY
- name: TOS_SECRET_KEY
valueFrom:
secretKeyRef:
name: tos-credential
key: TOS_SECRET_KEY
- name: TOS_ENDPOINT
value: https://tos-s3-cn-beijing.ivolces.com
- name: TOS_REGION
value: cn-beijing
volumeMounts:
- mountPath: /models
name: models
containers:
- name: ray-head
image: aibrix-container-registry-cn-beijing.cr.volces.com/aibrix/vllm-openai:v0.7.3.self.post1
ports:
- containerPort: 6379
name: gcs-server
- containerPort: 8265
name: dashboard
- containerPort: 10001
name: client
- containerPort: 8000
name: service
command: ["/bin/bash", "-lc", "--"]
args: ["ulimit -n 65536; echo head; $KUBERAY_GEN_RAY_START_CMD; vllm serve /models/deepseek-r1 --trust-remote-code --served-model-name deepseek-r1-671b --tensor-parallel-size 16 --distributed-executor-backend ray --uvicorn-log-level warning"]
env:
- name: GLOO_SOCKET_IFNAME
value: eth0
- name: NCCL_SOCKET_IFNAME
value: eth0
- name: NCCL_IB_DISABLE
value: "0"
- name: NCCL_IB_HCA
value: mlx5_1:1,mlx5_2:1,mlx5_3:1,mlx5_4:1,mlx5_5:1,mlx5_6:1,mlx5_7:1,mlx5_8:1
resources:
limits:
nvidia.com/gpu: 8
vke.volcengine.com/rdma: "8"
requests:
nvidia.com/gpu: 8
vke.volcengine.com/rdma: "8"
securityContext:
capabilities:
add:
- IPC_LOCK
startupProbe:
httpGet:
path: /metrics
port: service
initialDelaySeconds: 180
failureThreshold: 150
periodSeconds: 10
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /dev/shm
name: shared-mem
- mountPath: /models
name: models
volumes:
- name: shared-mem
emptyDir:
medium: Memory
- name: models
hostPath:
path: /mnt/nvme0/aibrix
type: DirectoryOrCreate
workerGroupSpecs:
- replicas: 1
minReplicas: 1
maxReplicas: 1
groupName: worker-group
rayStartParams: {}
template:
metadata:
labels:
model.aibrix.ai/name: deepseek-r1-671b
model.aibrix.ai/port: "8000"
annotations:
k8s.volcengine.com/pod-networks: |
[
{
"cniConf":{
"name":"rdma"
}
},
{
"cniConf":{
"name":"rdma"
}
},
{
"cniConf":{
"name":"rdma"
}
},
{
"cniConf":{
"name":"rdma"
}
},
{
"cniConf":{
"name":"rdma"
}
},
{
"cniConf":{
"name":"rdma"
}
},
{
"cniConf":{
"name":"rdma"
}
},
{
"cniConf":{
"name":"rdma"
}
}
]
spec:
initContainers:
- name: init-model
image: aibrix-container-registry-cn-beijing.cr.volces.com/aibrix/runtime:v0.2.0
command:
- aibrix_download
- --model-uri
- tos://aibrix-artifact-testing/models/deepseek-r1/
- --local-dir
- /models/
env:
- name: DOWNLOADER_MODEL_NAME
value: deepseek-r1
- name: DOWNLOADER_NUM_THREADS
value: "16"
- name: DOWNLOADER_ALLOW_FILE_SUFFIX
value: json, safetensors
- name: TOS_ACCESS_KEY
valueFrom:
secretKeyRef:
name: tos-credential
key: TOS_ACCESS_KEY
- name: TOS_SECRET_KEY
valueFrom:
secretKeyRef:
name: tos-credential
key: TOS_SECRET_KEY
- name: TOS_ENDPOINT
value: https://tos-s3-cn-beijing.ivolces.com
- name: TOS_REGION
value: cn-beijing
volumeMounts:
- mountPath: /models
name: models
containers:
- name: ray-worker
image: aibrix-container-registry-cn-beijing.cr.volces.com/aibrix/vllm-openai:v0.7.3.self.post1
command: ["/bin/bash", "-lc", "--"]
args: ["ulimit -n 65536; echo head; $KUBERAY_GEN_RAY_START_CMD;"]
env:
- name: GLOO_SOCKET_IFNAME
value: eth0
- name: NCCL_SOCKET_IFNAME
value: eth0
- name: NCCL_IB_DISABLE
value: "0"
- name: NCCL_IB_HCA
value: mlx5_1:1,mlx5_2:1,mlx5_3:1,mlx5_4:1,mlx5_5:1,mlx5_6:1,mlx5_7:1,mlx5_8:1
lifecycle:
preStop:
exec:
command: [ "/bin/sh","-c","ray stop" ]
resources:
limits:
nvidia.com/gpu: 8
vke.volcengine.com/rdma: "8"
requests:
nvidia.com/gpu: 8
vke.volcengine.com/rdma: "8"
securityContext:
capabilities:
add:
- IPC_LOCK
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /dev/shm
name: shared-mem
- mountPath: /models
name: models
volumes:
- name: shared-mem
emptyDir:
medium: Memory
- name: models
hostPath:
path: /mnt/nvme0/aibrix
type: DirectoryOrCreate
Expected behavior
it should work as expected
Environment
v0.2.1
container inside files
drwxr-xr-x 3 root root 16384 Mar 9 00:08 .
drwxr-xr-x 3 root root 4096 Mar 8 23:52 ..
drwxr-xr-x 3 root root 4096 Mar 8 23:52 .cache
-rw-r--r-- 1 root root 1729 Mar 8 23:52 config.json
-rw-r--r-- 1 root root 64 Mar 8 23:52 configuration.json
-rw-r--r-- 1 root root 171 Mar 8 23:52 generation_config.json
-rw-r--r-- 1 root root 5234139343 Mar 8 23:52 model-00001-of-000163.safetensors
-rw-r--r-- 1 root root 4302383966 Mar 8 23:52 model-00002-of-000163.safetensors
-rw-r--r-- 1 root root 4302384375 Mar 8 23:52 model-00003-of-000163.safetensors
-rw-r--r-- 1 root root 4302349996 Mar 8 23:52 model-00004-of-000163.safetensors
-rw-r--r-- 1 root root 4302384154 Mar 8 23:52 model-00005-of-000163.safetensors
-rw-r--r-- 1 root root 4372073602 Mar 8 23:52 model-00006-of-000163.safetensors
-rw-r--r-- 1 root root 4306080097 Mar 8 23:53 model-00007-of-000163.safetensors
-rw-r--r-- 1 root root 4302384356 Mar 8 23:53 model-00008-of-000163.safetensors
-rw-r--r-- 1 root root 4302350190 Mar 8 23:53 model-00009-of-000163.safetensors
-rw-r--r-- 1 root root 4302383960 Mar 8 23:53 model-00010-of-000163.safetensors
-rw-r--r-- 1 root root 4302384375 Mar 8 23:53 model-00011-of-000163.safetensors
-rw-r--r-- 1 root root 1321583941 Mar 8 23:53 model-00012-of-000163.safetensors
-rw-r--r-- 1 root root 4302317244 Mar 8 23:53 model-00013-of-000163.safetensors
-rw-r--r-- 1 root root 4302384328 Mar 8 23:53 model-00014-of-000163.safetensors
-rw-r--r-- 1 root root 4302350218 Mar 8 23:53 model-00015-of-000163.safetensors
-rw-r--r-- 1 root root 4302383932 Mar 8 23:53 model-00016-of-000163.safetensors
-rw-r--r-- 1 root root 4302384377 Mar 8 23:53 model-00017-of-000163.safetensors
-rw-r--r-- 1 root root 4302350026 Mar 8 23:54 model-00018-of-000163.safetensors
-rw-r--r-- 1 root root 4302384124 Mar 8 23:54 model-00019-of-000163.safetensors
-rw-r--r-- 1 root root 4302384377 Mar 8 23:54 model-00020-of-000163.safetensors
-rw-r--r-- 1 root root 4302350413 Mar 8 23:54 model-00021-of-000163.safetensors
-rw-r--r-- 1 root root 4302384900 Mar 8 23:54 model-00022-of-000163.safetensors
-rw-r--r-- 1 root root 4302350808 Mar 8 23:54 model-00023-of-000163.safetensors
-rw-r--r-- 1 root root 4302384504 Mar 8 23:54 model-00024-of-000163.safetensors
-rw-r--r-- 1 root root 4302384961 Mar 8 23:54 model-00025-of-000163.safetensors
-rw-r--r-- 1 root root 4302350620 Mar 8 23:54 model-00026-of-000163.safetensors
-rw-r--r-- 1 root root 4302384692 Mar 8 23:54 model-00027-of-000163.safetensors
-rw-r--r-- 1 root root 4302384963 Mar 8 23:55 model-00028-of-000163.safetensors
-rw-r--r-- 1 root root 4302350448 Mar 8 23:55 model-00029-of-000163.safetensors
-rw-r--r-- 1 root root 4302384884 Mar 8 23:55 model-00030-of-000163.safetensors
-rw-r--r-- 1 root root 4302350824 Mar 8 23:55 model-00031-of-000163.safetensors
-rw-r--r-- 1 root root 4302384488 Mar 8 23:55 model-00032-of-000163.safetensors
-rw-r--r-- 1 root root 4302384963 Mar 8 23:55 model-00033-of-000163.safetensors
-rw-r--r-- 1 root root 1747417474 Mar 8 23:55 model-00034-of-000163.safetensors
-rw-r--r-- 1 root root 4302317817 Mar 8 23:55 model-00035-of-000163.safetensors
-rw-r--r-- 1 root root 4302384914 Mar 8 23:55 model-00036-of-000163.safetensors
-rw-r--r-- 1 root root 4302350794 Mar 8 23:55 model-00037-of-000163.safetensors
-rw-r--r-- 1 root root 4302384518 Mar 8 23:56 model-00038-of-000163.safetensors
-rw-r--r-- 1 root root 4302384963 Mar 8 23:56 model-00039-of-000163.safetensors
-rw-r--r-- 1 root root 4302350602 Mar 8 23:56 model-00040-of-000163.safetensors
-rw-r--r-- 1 root root 4302384710 Mar 8 23:56 model-00041-of-000163.safetensors
-rw-r--r-- 1 root root 4302384963 Mar 8 23:56 model-00042-of-000163.safetensors
-rw-r--r-- 1 root root 4302350432 Mar 8 23:56 model-00043-of-000163.safetensors
-rw-r--r-- 1 root root 4302384900 Mar 8 23:56 model-00044-of-000163.safetensors
-rw-r--r-- 1 root root 4302350808 Mar 8 23:56 model-00045-of-000163.safetensors
-rw-r--r-- 1 root root 4302384504 Mar 8 23:56 model-00046-of-000163.safetensors
-rw-r--r-- 1 root root 4302384961 Mar 8 23:57 model-00047-of-000163.safetensors
-rw-r--r-- 1 root root 4302350620 Mar 8 23:57 model-00048-of-000163.safetensors
-rw-r--r-- 1 root root 4302384692 Mar 8 23:57 model-00049-of-000163.safetensors
-rw-r--r-- 1 root root 4302384963 Mar 8 23:57 model-00050-of-000163.safetensors
-rw-r--r-- 1 root root 4302350448 Mar 8 23:57 model-00051-of-000163.safetensors
-rw-r--r-- 1 root root 4302384884 Mar 8 23:57 model-00052-of-000163.safetensors
-rw-r--r-- 1 root root 4302350824 Mar 8 23:57 model-00053-of-000163.safetensors
-rw-r--r-- 1 root root 4302384488 Mar 8 23:57 model-00054-of-000163.safetensors
-rw-r--r-- 1 root root 4302384963 Mar 8 23:57 model-00055-of-000163.safetensors
-rw-r--r-- 1 root root 1747417474 Mar 8 23:57 model-00056-of-000163.safetensors
-rw-r--r-- 1 root root 4302317817 Mar 8 23:58 model-00057-of-000163.safetensors
-rw-r--r-- 1 root root 4302384914 Mar 8 23:58 model-00058-of-000163.safetensors
-rw-r--r-- 1 root root 4302350794 Mar 8 23:58 model-00059-of-000163.safetensors
-rw-r--r-- 1 root root 4302384518 Mar 8 23:58 model-00060-of-000163.safetensors
-rw-r--r-- 1 root root 4302384963 Mar 8 23:58 model-00061-of-000163.safetensors
-rw-r--r-- 1 root root 4302350602 Mar 8 23:58 model-00062-of-000163.safetensors
-rw-r--r-- 1 root root 4302384710 Mar 8 23:58 model-00063-of-000163.safetensors
-rw-r--r-- 1 root root 4302384963 Mar 8 23:58 model-00064-of-000163.safetensors
-rw-r--r-- 1 root root 4302350432 Mar 8 23:58 model-00065-of-000163.safetensors
-rw-r--r-- 1 root root 4302384900 Mar 8 23:58 model-00066-of-000163.safetensors
-rw-r--r-- 1 root root 4302350808 Mar 8 23:59 model-00067-of-000163.safetensors
-rw-r--r-- 1 root root 4302384504 Mar 8 23:59 model-00068-of-000163.safetensors
-rw-r--r-- 1 root root 4302384961 Mar 8 23:59 model-00069-of-000163.safetensors
-rw-r--r-- 1 root root 4302350620 Mar 8 23:59 model-00070-of-000163.safetensors
-rw-r--r-- 1 root root 4302384692 Mar 8 23:59 model-00071-of-000163.safetensors
-rw-r--r-- 1 root root 4302384963 Mar 8 23:59 model-00072-of-000163.safetensors
-rw-r--r-- 1 root root 4302350448 Mar 8 23:59 model-00073-of-000163.safetensors
-rw-r--r-- 1 root root 4302384884 Mar 8 23:59 model-00074-of-000163.safetensors
-rw-r--r-- 1 root root 4302350824 Mar 8 23:59 model-00075-of-000163.safetensors
-rw-r--r-- 1 root root 4302384488 Mar 9 00:00 model-00076-of-000163.safetensors
-rw-r--r-- 1 root root 4302384963 Mar 9 00:00 model-00077-of-000163.safetensors
-rw-r--r-- 1 root root 1747417474 Mar 9 00:00 model-00078-of-000163.safetensors
-rw-r--r-- 1 root root 4302317817 Mar 9 00:00 model-00079-of-000163.safetensors
-rw-r--r-- 1 root root 4302384914 Mar 9 00:00 model-00080-of-000163.safetensors
-rw-r--r-- 1 root root 4302350794 Mar 9 00:00 model-00081-of-000163.safetensors
-rw-r--r-- 1 root root 4302384518 Mar 9 00:00 model-00082-of-000163.safetensors
-rw-r--r-- 1 root root 4302384963 Mar 9 00:00 model-00083-of-000163.safetensors
-rw-r--r-- 1 root root 4302350602 Mar 9 00:00 model-00084-of-000163.safetensors
-rw-r--r-- 1 root root 4302384710 Mar 9 00:00 model-00085-of-000163.safetensors
-rw-r--r-- 1 root root 4302384963 Mar 9 00:00 model-00086-of-000163.safetensors
-rw-r--r-- 1 root root 4302350432 Mar 9 00:01 model-00087-of-000163.safetensors
-rw-r--r-- 1 root root 4302384900 Mar 9 00:01 model-00088-of-000163.safetensors
-rw-r--r-- 1 root root 4302350808 Mar 9 00:01 model-00089-of-000163.safetensors
-rw-r--r-- 1 root root 4302384504 Mar 9 00:01 model-00090-of-000163.safetensors
-rw-r--r-- 1 root root 4302384961 Mar 9 00:01 model-00091-of-000163.safetensors
-rw-r--r-- 1 root root 4302350620 Mar 9 00:01 model-00092-of-000163.safetensors
-rw-r--r-- 1 root root 4302384692 Mar 9 00:01 model-00093-of-000163.safetensors
-rw-r--r-- 1 root root 4302384963 Mar 9 00:01 model-00094-of-000163.safetensors
-rw-r--r-- 1 root root 4302350448 Mar 9 00:01 model-00095-of-000163.safetensors
-rw-r--r-- 1 root root 4302384884 Mar 9 00:02 model-00096-of-000163.safetensors
-rw-r--r-- 1 root root 4302350824 Mar 9 00:02 model-00097-of-000163.safetensors
-rw-r--r-- 1 root root 4302384488 Mar 9 00:02 model-00098-of-000163.safetensors
-rw-r--r-- 1 root root 4302384963 Mar 9 00:02 model-00099-of-000163.safetensors
-rw-r--r-- 1 root root 1747417474 Mar 9 00:02 model-00100-of-000163.safetensors
-rw-r--r-- 1 root root 4302317817 Mar 9 00:02 model-00101-of-000163.safetensors
-rw-r--r-- 1 root root 4302384914 Mar 9 00:02 model-00102-of-000163.safetensors
-rw-r--r-- 1 root root 4302350794 Mar 9 00:02 model-00103-of-000163.safetensors
-rw-r--r-- 1 root root 4302384518 Mar 9 00:02 model-00104-of-000163.safetensors
-rw-r--r-- 1 root root 4302384963 Mar 9 00:02 model-00105-of-000163.safetensors
-rw-r--r-- 1 root root 4302350602 Mar 9 00:03 model-00106-of-000163.safetensors
-rw-r--r-- 1 root root 4302384710 Mar 9 00:03 model-00107-of-000163.safetensors
-rw-r--r-- 1 root root 4302384963 Mar 9 00:03 model-00108-of-000163.safetensors
-rw-r--r-- 1 root root 4302350432 Mar 9 00:03 model-00109-of-000163.safetensors
-rw-r--r-- 1 root root 4302384900 Mar 9 00:03 model-00110-of-000163.safetensors
-rw-r--r-- 1 root root 4302350808 Mar 9 00:03 model-00111-of-000163.safetensors
-rw-r--r-- 1 root root 4302384504 Mar 9 00:03 model-00112-of-000163.safetensors
-rw-r--r-- 1 root root 4302384961 Mar 9 00:03 model-00113-of-000163.safetensors
-rw-r--r-- 1 root root 4302350620 Mar 9 00:03 model-00114-of-000163.safetensors
-rw-r--r-- 1 root root 4302384692 Mar 9 00:03 model-00115-of-000163.safetensors
-rw-r--r-- 1 root root 4302384963 Mar 9 00:04 model-00116-of-000163.safetensors
-rw-r--r-- 1 root root 4302350448 Mar 9 00:04 model-00117-of-000163.safetensors
-rw-r--r-- 1 root root 4302384884 Mar 9 00:04 model-00118-of-000163.safetensors
-rw-r--r-- 1 root root 4302350824 Mar 9 00:04 model-00119-of-000163.safetensors
-rw-r--r-- 1 root root 4302384488 Mar 9 00:04 model-00120-of-000163.safetensors
-rw-r--r-- 1 root root 4302384963 Mar 9 00:04 model-00121-of-000163.safetensors
-rw-r--r-- 1 root root 1747417474 Mar 9 00:04 model-00122-of-000163.safetensors
-rw-r--r-- 1 root root 4302317817 Mar 9 00:04 model-00123-of-000163.safetensors
-rw-r--r-- 1 root root 4302384914 Mar 9 00:04 model-00124-of-000163.safetensors
-rw-r--r-- 1 root root 4302350794 Mar 9 00:04 model-00125-of-000163.safetensors
-rw-r--r-- 1 root root 4302384518 Mar 9 00:05 model-00126-of-000163.safetensors
-rw-r--r-- 1 root root 4302384963 Mar 9 00:05 model-00127-of-000163.safetensors
-rw-r--r-- 1 root root 4302350602 Mar 9 00:05 model-00128-of-000163.safetensors
-rw-r--r-- 1 root root 4302384710 Mar 9 00:05 model-00129-of-000163.safetensors
-rw-r--r-- 1 root root 4302384963 Mar 9 00:05 model-00130-of-000163.safetensors
-rw-r--r-- 1 root root 4302350432 Mar 9 00:05 model-00131-of-000163.safetensors
-rw-r--r-- 1 root root 4302384900 Mar 9 00:05 model-00132-of-000163.safetensors
-rw-r--r-- 1 root root 4302350808 Mar 9 00:05 model-00133-of-000163.safetensors
-rw-r--r-- 1 root root 4302384504 Mar 9 00:05 model-00134-of-000163.safetensors
-rw-r--r-- 1 root root 4302384961 Mar 9 00:05 model-00135-of-000163.safetensors
-rw-r--r-- 1 root root 4302350620 Mar 9 00:06 model-00136-of-000163.safetensors
-rw-r--r-- 1 root root 4302384692 Mar 9 00:06 model-00137-of-000163.safetensors
-rw-r--r-- 1 root root 4302384963 Mar 9 00:06 model-00138-of-000163.safetensors
-rw-r--r-- 1 root root 4302350448 Mar 9 00:06 model-00139-of-000163.safetensors
-rw-r--r-- 1 root root 4302384884 Mar 9 00:06 model-00140-of-000163.safetensors
-rw-r--r-- 1 root root 3142388798 Mar 9 00:06 model-00141-of-000163.safetensors
-rw-r--r-- 1 root root 4302317817 Mar 9 00:06 model-00142-of-000163.safetensors
-rw-r--r-- 1 root root 4302384914 Mar 9 00:06 model-00143-of-000163.safetensors
-rw-r--r-- 1 root root 4302350794 Mar 9 00:06 model-00144-of-000163.safetensors
-rw-r--r-- 1 root root 4302384518 Mar 9 00:06 model-00145-of-000163.safetensors
-rw-r--r-- 1 root root 4302384963 Mar 9 00:07 model-00146-of-000163.safetensors
-rw-r--r-- 1 root root 4302350602 Mar 9 00:07 model-00147-of-000163.safetensors
-rw-r--r-- 1 root root 4302384710 Mar 9 00:07 model-00148-of-000163.safetensors
-rw-r--r-- 1 root root 4302384963 Mar 9 00:07 model-00149-of-000163.safetensors
-rw-r--r-- 1 root root 4302350432 Mar 9 00:07 model-00150-of-000163.safetensors
-rw-r--r-- 1 root root 4302384900 Mar 9 00:07 model-00151-of-000163.safetensors
-rw-r--r-- 1 root root 4302350808 Mar 9 00:07 model-00152-of-000163.safetensors
-rw-r--r-- 1 root root 4302384504 Mar 9 00:07 model-00153-of-000163.safetensors
-rw-r--r-- 1 root root 4302384961 Mar 9 00:07 model-00154-of-000163.safetensors
-rw-r--r-- 1 root root 4302350620 Mar 9 00:08 model-00155-of-000163.safetensors
-rw-r--r-- 1 root root 4302384692 Mar 9 00:08 model-00156-of-000163.safetensors
-rw-r--r-- 1 root root 4302384963 Mar 9 00:08 model-00157-of-000163.safetensors
-rw-r--r-- 1 root root 4302350448 Mar 9 00:08 model-00158-of-000163.safetensors
-rw-r--r-- 1 root root 4302384884 Mar 9 00:08 model-00159-of-000163.safetensors
-rw-r--r-- 1 root root 5230637362 Mar 9 00:08 model-00160-of-000163.safetensors
-rw-r--r-- 1 root root 4302384321 Mar 9 00:08 model-00161-of-000163.safetensors
-rw-r--r-- 1 root root 4302384948 Mar 9 00:08 model-00162-of-000163.safetensors
-rw-r--r-- 1 root root 6584784447 Mar 9 00:08 model-00163-of-000163.safetensors
-rw-r--r-- 1 root root 8898324 Mar 9 00:08 model.safetensors.index.json
-rw-r--r-- 1 root root 7847602 Mar 9 00:08 tokenizer.json
-rw-r--r-- 1 root root 3584 Mar 9 00:08 tokenizer_config.json
I initially use following configuration but https://huggingface.co/deepseek-ai/DeepSeek-R1/tree/main has a important files named configuration_deepseek.py,
- name: DOWNLOADER_ALLOW_FILE_SUFFIX
value: json, safetensors
change the value to json, safetensors, py resolves the issue. I would say this is a misconfiguration issue but there're things to improve, we should use ignore pattern to block .bin etc instead of whitelist way
When executing this command, the $KUBERAY_GEN_RAY_START_CMD blocks the process, preventing the subsequent execution of the vllm serve command: vllm serve /models/deepseek-r1 --trust-remote-code --served-model-name deepseek-r1-671b --tensor-parallel-size 16 --distributed-executor-backend ray --uvicorn-log-level warning";so, it will be "ulimit -n 65536; echo head; nohup $KUBERAY_GEN_RAY_START_CMD > /tmp/ray_start.log 2>&1 & vllm serve /models/deepseek-r1 --trust-remote-code --served-model-name deepseek-r1-671b --tensor-parallel-size 16 --distributed-executor-backend ray --uvicorn-log-level warning"
@ying2025 which kuberay version are you using?
ray, version 2.42.0
- Did you add
block: 'false'in therayStartParams? this is required to remove the--blockin startup command - Underneath operator kuberay has a bug on disabling
--blockand we fixed it and built an imageaibrix/kuberay-operator:v1.2.1-patch. We work with upstream to bring that change back soon. feel free to confirm you are using this version. You can runkubectl describe deployment aibrix-kuberay-operator -n aibrix-systemto verify it.
Some docs might be outdated. feel free to check docs here https://github.com/vllm-project/aibrix/blob/main/samples/deepseek-r1/deepseek-r1-huggingface.yaml as an example
- Did you add
block: 'false'in therayStartParams? this is required to remove the--blockin startup command- Underneath operator kuberay has a bug on disabling
--blockand we fixed it and built an imageaibrix/kuberay-operator:v1.2.1-patch. We work with upstream to bring that change back soon. feel free to confirm you are using this version. You can runkubectl describe deployment aibrix-kuberay-operator -n aibrix-systemto verify it.Some docs might be outdated. feel free to check docs here https://github.com/vllm-project/aibrix/blob/main/samples/deepseek-r1/deepseek-r1-huggingface.yaml as an exampl
ok, When I update the kuberay-operator and add block: 'false' in the rayStartParams it's ok.
@ying2025 thanks for the confirmation
2. v1.2.1-patch
Will this code change be merged into ray, or can you provide the relevant code?
@ying2025 Yeah, it will be part of kuberay. I am asking one engineer to help with it. here's the code branch https://github.com/ray-project/kuberay/commit/91e1c26fbf1fc0f505ff7d16b70cf8228ed62ec4#diff-cc9abb27aaceca3f10193e2ab35fb00dca44b8858709c5c0f4df751c1387291aR576 and original issue https://github.com/vllm-project/aibrix/issues/245#issuecomment-2394811082
@ying2025 Yeah, it will be part of kuberay. I am asking one engineer to help with it. here's the code branch ray-project/kuberay@91e1c26#diff-cc9abb27aaceca3f10193e2ab35fb00dca44b8858709c5c0f4df751c1387291aR576 and original issue #245 (comment)
ok, thanks