aibrix icon indicating copy to clipboard operation
aibrix copied to clipboard

OSError: /models/deepseek-r1 does not appear to have a file named configuration_deepseek.py. Checkout 'https://huggingface.co//models/deepseek-r1/tree/None' for available files.

Open Jeffwan opened this issue 9 months ago • 8 comments

🐛 Describe the bug

INFO 03-09 00:16:28 api_server.py:913] args: Namespace(subparser='serve', model_tag='/models/deepseek-r1', config='', host=None, port=8000, uvicorn_log_level='warning', allow_credentials=False, allowed_origins=['*'], allowed_methods=['*'], allowed_headers=['*'], api_key=None, lora_modules=None, prompt_adapters=None, chat_template=None, chat_template_content_format='auto', response_role='assistant', ssl_keyfile=None, ssl_certfile=None, ssl_ca_certs=None, ssl_cert_reqs=0, root_path=None, middleware=[], return_tokens_as_token_ids=False, disable_frontend_multiprocessing=False, enable_request_id_headers=False, enable_auto_tool_choice=False, enable_reasoning=False, reasoning_parser=None, tool_call_parser=None, tool_parser_plugin='', model='/models/deepseek-r1', task='auto', tokenizer=None, skip_tokenizer_init=False, revision=None, code_revision=None, tokenizer_revision=None, tokenizer_mode='auto', trust_remote_code=True, allowed_local_media_path=None, download_dir=None, load_format='auto', config_format=<ConfigFormat.AUTO: 'auto'>, dtype='auto', kv_cache_dtype='auto', max_model_len=None, guided_decoding_backend='xgrammar', logits_processor_pattern=None, model_impl='auto', distributed_executor_backend='ray', pipeline_parallel_size=1, tensor_parallel_size=16, max_parallel_loading_workers=None, ray_workers_use_nsight=False, block_size=None, enable_prefix_caching=None, disable_sliding_window=False, use_v2_block_manager=True, num_lookahead_slots=0, seed=0, swap_space=4, cpu_offload_gb=0, gpu_memory_utilization=0.9, num_gpu_blocks_override=None, max_num_batched_tokens=None, max_num_partial_prefills=1, max_long_partial_prefills=1, long_prefill_token_threshold=0, max_num_seqs=None, max_logprobs=20, disable_log_stats=False, quantization=None, rope_scaling=None, rope_theta=None, hf_overrides=None, enforce_eager=False, max_seq_len_to_capture=8192, disable_custom_all_reduce=False, tokenizer_pool_size=0, tokenizer_pool_type='ray', tokenizer_pool_extra_config=None, limit_mm_per_prompt=None, mm_processor_kwargs=None, disable_mm_preprocessor_cache=False, enable_lora=False, enable_lora_bias=False, max_loras=1, max_lora_rank=16, lora_extra_vocab_size=256, lora_dtype='auto', long_lora_scaling_factors=None, max_cpu_loras=None, fully_sharded_loras=False, enable_prompt_adapter=False, max_prompt_adapters=1, max_prompt_adapter_token=0, device='auto', num_scheduler_steps=1, multi_step_stream_outputs=True, scheduler_delay_factor=0.0, enable_chunked_prefill=None, speculative_model=None, speculative_model_quantization=None, num_speculative_tokens=None, speculative_disable_mqa_scorer=False, speculative_draft_tensor_parallel_size=None, speculative_max_model_len=None, speculative_disable_by_batch_size=None, ngram_prompt_lookup_max=None, ngram_prompt_lookup_min=None, spec_decoding_acceptance_method='rejection_sampler', typical_acceptance_sampler_posterior_threshold=None, typical_acceptance_sampler_posterior_alpha=None, disable_logprobs_during_spec_decoding=None, model_loader_extra_config=None, ignore_patterns=[], preemption_mode=None, served_model_name=['deepseek-r1-671b'], qlora_adapter_name_or_path=None, otlp_traces_endpoint=None, collect_detailed_traces=None, disable_async_output_proc=False, scheduling_policy='fcfs', scheduler_cls='vllm.core.scheduler.Scheduler', override_neuron_config=None, override_pooler_config=None, compilation_config=None, kv_transfer_config=None, worker_cls='auto', generation_config=None, override_generation_config=None, enable_sleep_mode=False, calculate_kv_scales=False, additional_config=None, disable_log_requests=False, max_log_len=None, disable_fastapi_docs=False, enable_prompt_tokens_details=False, dispatch_function=<function ServeSubcommand.cmd at 0x7f0b77561d00>)
INFO 03-09 00:16:28 api_server.py:209] Started engine process with PID 734
Could not locate the configuration_deepseek.py inside /models/deepseek-r1.
Traceback (most recent call last):
  File "/usr/local/bin/vllm", line 10, in <module>
    sys.exit(main())
             ^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/cli/main.py", line 73, in main
    args.dispatch_function(args)
  File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/cli/serve.py", line 34, in cmd
    uvloop.run(run_server(args))
  File "/usr/local/lib/python3.12/dist-packages/uvloop/__init__.py", line 109, in run
    return __asyncio.run(
           ^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/asyncio/runners.py", line 195, in run
    return runner.run(main)
           ^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/asyncio/runners.py", line 118, in run
    return self._loop.run_until_complete(task)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "uvloop/loop.pyx", line 1518, in uvloop.loop.Loop.run_until_complete
  File "/usr/local/lib/python3.12/dist-packages/uvloop/__init__.py", line 61, in wrapper
    return await main
           ^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 947, in run_server
    async with build_async_engine_client(args) as engine_client:
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/contextlib.py", line 210, in __aenter__
    return await anext(self.gen)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 139, in build_async_engine_client
    async with build_async_engine_client_from_engine_args(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/contextlib.py", line 210, in __aenter__
    return await anext(self.gen)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 220, in build_async_engine_client_from_engine_args
    engine_config = engine_args.create_engine_config()
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/vllm/engine/arg_utils.py", line 1127, in create_engine_config
    model_config = self.create_model_config()
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/vllm/engine/arg_utils.py", line 1047, in create_model_config
    return ModelConfig(
           ^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/vllm/config.py", line 304, in __init__
    hf_config = get_config(self.model, trust_remote_code, revision,
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/vllm/transformers_utils/config.py", line 287, in get_config
    config = AutoConfig.from_pretrained(
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/transformers/models/auto/configuration_auto.py", line 1084, in from_pretrained
    config_class = get_class_from_dynamic_module(
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/transformers/dynamic_module_utils.py", line 541, in get_class_from_dynamic_module
    final_module = get_cached_module_file(
                   ^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/transformers/dynamic_module_utils.py", line 345, in get_cached_module_file
    resolved_module_file = cached_file(
                           ^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/transformers/utils/hub.py", line 313, in cached_file
    raise EnvironmentError(
OSError: /models/deepseek-r1 does not appear to have a file named configuration_deepseek.py. Checkout 'https://huggingface.co//models/deepseek-r1/tree/None' for available files.
INFO 03-09 00:16:31 __init__.py:207] Automatically detected platform cuda.
Could not locate the configuration_deepseek.py inside /models/deepseek-r1.
ERROR 03-09 00:16:32 engine.py:400] /models/deepseek-r1 does not appear to have a file named configuration_deepseek.py. Checkout 'https://huggingface.co//models/deepseek-r1/tree/None' for available files.
ERROR 03-09 00:16:32 engine.py:400] Traceback (most recent call last):
ERROR 03-09 00:16:32 engine.py:400]   File "/usr/local/lib/python3.12/dist-packages/vllm/engine/multiprocessing/engine.py", line 391, in run_mp_engine
ERROR 03-09 00:16:32 engine.py:400]     engine = MQLLMEngine.from_engine_args(engine_args=engine_args,
ERROR 03-09 00:16:32 engine.py:400]              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 03-09 00:16:32 engine.py:400]   File "/usr/local/lib/python3.12/dist-packages/vllm/engine/multiprocessing/engine.py", line 119, in from_engine_args
ERROR 03-09 00:16:32 engine.py:400]     engine_config = engine_args.create_engine_config(usage_context)
ERROR 03-09 00:16:32 engine.py:400]                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 03-09 00:16:32 engine.py:400]   File "/usr/local/lib/python3.12/dist-packages/vllm/engine/arg_utils.py", line 1127, in create_engine_config
ERROR 03-09 00:16:32 engine.py:400]     model_config = self.create_model_config()
ERROR 03-09 00:16:32 engine.py:400]                    ^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 03-09 00:16:32 engine.py:400]   File "/usr/local/lib/python3.12/dist-packages/vllm/engine/arg_utils.py", line 1047, in create_model_config
ERROR 03-09 00:16:32 engine.py:400]     return ModelConfig(
ERROR 03-09 00:16:32 engine.py:400]            ^^^^^^^^^^^^
ERROR 03-09 00:16:32 engine.py:400]   File "/usr/local/lib/python3.12/dist-packages/vllm/config.py", line 304, in __init__
ERROR 03-09 00:16:32 engine.py:400]     hf_config = get_config(self.model, trust_remote_code, revision,
ERROR 03-09 00:16:32 engine.py:400]                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 03-09 00:16:32 engine.py:400]   File "/usr/local/lib/python3.12/dist-packages/vllm/transformers_utils/config.py", line 287, in get_config
ERROR 03-09 00:16:32 engine.py:400]     config = AutoConfig.from_pretrained(
ERROR 03-09 00:16:32 engine.py:400]              ^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 03-09 00:16:32 engine.py:400]   File "/usr/local/lib/python3.12/dist-packages/transformers/models/auto/configuration_auto.py", line 1084, in from_pretrained
ERROR 03-09 00:16:32 engine.py:400]     config_class = get_class_from_dynamic_module(
ERROR 03-09 00:16:32 engine.py:400]                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 03-09 00:16:32 engine.py:400]   File "/usr/local/lib/python3.12/dist-packages/transformers/dynamic_module_utils.py", line 541, in get_class_from_dynamic_module
ERROR 03-09 00:16:32 engine.py:400]     final_module = get_cached_module_file(
ERROR 03-09 00:16:32 engine.py:400]                    ^^^^^^^^^^^^^^^^^^^^^^^
ERROR 03-09 00:16:32 engine.py:400]   File "/usr/local/lib/python3.12/dist-packages/transformers/dynamic_module_utils.py", line 345, in get_cached_module_file
ERROR 03-09 00:16:32 engine.py:400]     resolved_module_file = cached_file(
ERROR 03-09 00:16:32 engine.py:400]                            ^^^^^^^^^^^^
ERROR 03-09 00:16:32 engine.py:400]   File "/usr/local/lib/python3.12/dist-packages/transformers/utils/hub.py", line 313, in cached_file
ERROR 03-09 00:16:32 engine.py:400]     raise EnvironmentError(
ERROR 03-09 00:16:32 engine.py:400] OSError: /models/deepseek-r1 does not appear to have a file named configuration_deepseek.py. Checkout 'https://huggingface.co//models/deepseek-r1/tree/None' for available files.
Process SpawnProcess-1:
Traceback (most recent call last):
  File "/usr/lib/python3.12/multiprocessing/process.py", line 314, in _bootstrap
    self.run()
  File "/usr/lib/python3.12/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/local/lib/python3.12/dist-packages/vllm/engine/multiprocessing/engine.py", line 402, in run_mp_engine
    raise e
  File "/usr/local/lib/python3.12/dist-packages/vllm/engine/multiprocessing/engine.py", line 391, in run_mp_engine
    engine = MQLLMEngine.from_engine_args(engine_args=engine_args,
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/vllm/engine/multiprocessing/engine.py", line 119, in from_engine_args
    engine_config = engine_args.create_engine_config(usage_context)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/vllm/engine/arg_utils.py", line 1127, in create_engine_config
    model_config = self.create_model_config()
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/vllm/engine/arg_utils.py", line 1047, in create_model_config
    return ModelConfig(
           ^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/vllm/config.py", line 304, in __init__
    hf_config = get_config(self.model, trust_remote_code, revision,
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/vllm/transformers_utils/config.py", line 287, in get_config
    config = AutoConfig.from_pretrained(
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/transformers/models/auto/configuration_auto.py", line 1084, in from_pretrained
    config_class = get_class_from_dynamic_module(
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/transformers/dynamic_module_utils.py", line 541, in get_class_from_dynamic_module
    final_module = get_cached_module_file(
                   ^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/transformers/dynamic_module_utils.py", line 345, in get_cached_module_file
    resolved_module_file = cached_file(
                           ^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/transformers/utils/hub.py", line 313, in cached_file
    raise EnvironmentError(
OSError: /models/deepseek-r1 does not appear to have a file named configuration_deepseek.py. Checkout 'https://huggingface.co//models/deepseek-r1/tree/None' for available files.

Steps to Reproduce

apiVersion: orchestration.aibrix.ai/v1alpha1
kind: RayClusterFleet
metadata:
  labels:
    app.kubernetes.io/name: aibrix
    model.aibrix.ai/name: deepseek-r1-671b
    model.aibrix.ai/port: "8000"
  name: deepseek-r1-671b
spec:
  replicas: 1
  selector:
    matchLabels:
      model.aibrix.ai/name: deepseek-r1-671b
      model.aibrix.ai/port: "8000"
  strategy:
    rollingUpdate:
      maxSurge: 25%
      maxUnavailable: 25%
    type: RollingUpdate
  template:
    metadata:
      labels:
        model.aibrix.ai/name: deepseek-r1-671b
        model.aibrix.ai/port: "8000"
      annotations:
        ray.io/overwrite-container-cmd: "true"
    spec:
      rayVersion: '2.40.0'
      headGroupSpec:
        rayStartParams:
          dashboard-host: '0.0.0.0'
          block: 'false'
        template:
          metadata:
            labels:
              model.aibrix.ai/name: deepseek-r1-671b
              model.aibrix.ai/port: "8000"
            annotations:
              k8s.volcengine.com/pod-networks: |
                [
                  {
                    "cniConf":{
                        "name":"rdma"
                    }
                  },
                  {
                    "cniConf":{
                        "name":"rdma"
                    }
                  },
                  {
                    "cniConf":{
                        "name":"rdma"
                    }
                  },
                  {
                    "cniConf":{
                        "name":"rdma"
                    }
                  },
                  {
                    "cniConf":{
                        "name":"rdma"
                    }
                  },
                  {
                    "cniConf":{
                        "name":"rdma"
                    }
                  },
                  {
                    "cniConf":{
                        "name":"rdma"
                    }
                  },
                  {
                    "cniConf":{
                        "name":"rdma"
                    }
                  }
                ]
          spec:
            initContainers:
            - name: init-model
              image: aibrix-container-registry-cn-beijing.cr.volces.com/aibrix/runtime:v0.2.0
              command:
                - aibrix_download
                - --model-uri
                - tos://aibrix-artifact-testing/models/deepseek-r1/
                - --local-dir
                - /models/
              env:
                - name: DOWNLOADER_MODEL_NAME
                  value: deepseek-r1
                - name: DOWNLOADER_NUM_THREADS
                  value: "16"
                - name: DOWNLOADER_ALLOW_FILE_SUFFIX
                  value: json, safetensors
                - name: TOS_ACCESS_KEY
                  valueFrom:
                    secretKeyRef:
                      name: tos-credential
                      key: TOS_ACCESS_KEY
                - name: TOS_SECRET_KEY
                  valueFrom:
                    secretKeyRef:
                      name: tos-credential
                      key: TOS_SECRET_KEY
                - name: TOS_ENDPOINT
                  value: https://tos-s3-cn-beijing.ivolces.com
                - name: TOS_REGION
                  value: cn-beijing
              volumeMounts:
                - mountPath: /models
                  name: models
            containers:
            - name: ray-head
              image: aibrix-container-registry-cn-beijing.cr.volces.com/aibrix/vllm-openai:v0.7.3.self.post1
              ports:
              - containerPort: 6379
                name: gcs-server
              - containerPort: 8265
                name: dashboard
              - containerPort: 10001
                name: client
              - containerPort: 8000
                name: service
              command: ["/bin/bash", "-lc", "--"]
              args: ["ulimit -n 65536; echo head; $KUBERAY_GEN_RAY_START_CMD; vllm serve /models/deepseek-r1 --trust-remote-code --served-model-name deepseek-r1-671b --tensor-parallel-size 16 --distributed-executor-backend ray --uvicorn-log-level warning"]
              env:
              - name: GLOO_SOCKET_IFNAME
                value: eth0
              - name: NCCL_SOCKET_IFNAME
                value: eth0
              - name: NCCL_IB_DISABLE
                value: "0"
              - name: NCCL_IB_HCA
                value: mlx5_1:1,mlx5_2:1,mlx5_3:1,mlx5_4:1,mlx5_5:1,mlx5_6:1,mlx5_7:1,mlx5_8:1
              resources:
                limits:
                  nvidia.com/gpu: 8
                  vke.volcengine.com/rdma: "8"
                requests:
                  nvidia.com/gpu: 8
                  vke.volcengine.com/rdma: "8"
              securityContext:
                capabilities:
                  add:
                  - IPC_LOCK
              startupProbe:
                httpGet:
                  path: /metrics
                  port: service
                initialDelaySeconds: 180
                failureThreshold: 150
                periodSeconds: 10
              terminationMessagePath: /dev/termination-log
              terminationMessagePolicy: File
              volumeMounts:
                - mountPath: /dev/shm
                  name: shared-mem
                - mountPath: /models
                  name: models
            volumes:
              - name: shared-mem
                emptyDir:
                  medium: Memory
              - name: models
                hostPath:
                  path: /mnt/nvme0/aibrix
                  type: DirectoryOrCreate
      workerGroupSpecs:
      - replicas: 1
        minReplicas: 1
        maxReplicas: 1
        groupName: worker-group
        rayStartParams: {}
        template:
          metadata:
            labels:
              model.aibrix.ai/name: deepseek-r1-671b
              model.aibrix.ai/port: "8000"
            annotations:
              k8s.volcengine.com/pod-networks: |
                [
                  {
                    "cniConf":{
                        "name":"rdma"
                    }
                  },
                  {
                    "cniConf":{
                        "name":"rdma"
                    }
                  },
                  {
                    "cniConf":{
                        "name":"rdma"
                    }
                  },
                  {
                    "cniConf":{
                        "name":"rdma"
                    }
                  },
                                {
                    "cniConf":{
                        "name":"rdma"
                    }
                  },
                  {
                    "cniConf":{
                        "name":"rdma"
                    }
                  },
                  {
                    "cniConf":{
                        "name":"rdma"
                    }
                  },
                  {
                    "cniConf":{
                        "name":"rdma"
                    }
                  }
                ]
          spec:
            initContainers:
            - name: init-model
              image: aibrix-container-registry-cn-beijing.cr.volces.com/aibrix/runtime:v0.2.0
              command:
                - aibrix_download
                - --model-uri
                - tos://aibrix-artifact-testing/models/deepseek-r1/
                - --local-dir
                - /models/
              env:
                - name: DOWNLOADER_MODEL_NAME
                  value: deepseek-r1
                - name: DOWNLOADER_NUM_THREADS
                  value: "16"
                - name: DOWNLOADER_ALLOW_FILE_SUFFIX
                  value: json, safetensors
                - name: TOS_ACCESS_KEY
                  valueFrom:
                    secretKeyRef:
                      name: tos-credential
                      key: TOS_ACCESS_KEY
                - name: TOS_SECRET_KEY
                  valueFrom:
                    secretKeyRef:
                      name: tos-credential
                      key: TOS_SECRET_KEY
                - name: TOS_ENDPOINT
                  value: https://tos-s3-cn-beijing.ivolces.com
                - name: TOS_REGION
                  value: cn-beijing
              volumeMounts:
                - mountPath: /models
                  name: models
            containers:
            - name: ray-worker
              image: aibrix-container-registry-cn-beijing.cr.volces.com/aibrix/vllm-openai:v0.7.3.self.post1
              command: ["/bin/bash", "-lc", "--"]
              args: ["ulimit -n 65536; echo head; $KUBERAY_GEN_RAY_START_CMD;"]
              env:
              - name: GLOO_SOCKET_IFNAME
                value: eth0
              - name: NCCL_SOCKET_IFNAME
                value: eth0
              - name: NCCL_IB_DISABLE
                value: "0"
              - name: NCCL_IB_HCA
                value: mlx5_1:1,mlx5_2:1,mlx5_3:1,mlx5_4:1,mlx5_5:1,mlx5_6:1,mlx5_7:1,mlx5_8:1
              lifecycle:
                preStop:
                  exec:
                    command: [ "/bin/sh","-c","ray stop" ]
              resources:
                limits:
                  nvidia.com/gpu: 8
                  vke.volcengine.com/rdma: "8"
                requests:
                  nvidia.com/gpu: 8
                  vke.volcengine.com/rdma: "8"
              securityContext:
                capabilities:
                  add:
                    - IPC_LOCK
              terminationMessagePath: /dev/termination-log
              terminationMessagePolicy: File
              volumeMounts:
                - mountPath: /dev/shm
                  name: shared-mem
                - mountPath: /models
                  name: models
            volumes:
              - name: shared-mem
                emptyDir:
                  medium: Memory
              - name: models
                hostPath:
                  path: /mnt/nvme0/aibrix
                  type: DirectoryOrCreate

Expected behavior

it should work as expected

Environment

v0.2.1

Jeffwan avatar Mar 09 '25 08:03 Jeffwan

container inside files

drwxr-xr-x 3 root root      16384 Mar  9 00:08 .
drwxr-xr-x 3 root root       4096 Mar  8 23:52 ..
drwxr-xr-x 3 root root       4096 Mar  8 23:52 .cache
-rw-r--r-- 1 root root       1729 Mar  8 23:52 config.json
-rw-r--r-- 1 root root         64 Mar  8 23:52 configuration.json
-rw-r--r-- 1 root root        171 Mar  8 23:52 generation_config.json
-rw-r--r-- 1 root root 5234139343 Mar  8 23:52 model-00001-of-000163.safetensors
-rw-r--r-- 1 root root 4302383966 Mar  8 23:52 model-00002-of-000163.safetensors
-rw-r--r-- 1 root root 4302384375 Mar  8 23:52 model-00003-of-000163.safetensors
-rw-r--r-- 1 root root 4302349996 Mar  8 23:52 model-00004-of-000163.safetensors
-rw-r--r-- 1 root root 4302384154 Mar  8 23:52 model-00005-of-000163.safetensors
-rw-r--r-- 1 root root 4372073602 Mar  8 23:52 model-00006-of-000163.safetensors
-rw-r--r-- 1 root root 4306080097 Mar  8 23:53 model-00007-of-000163.safetensors
-rw-r--r-- 1 root root 4302384356 Mar  8 23:53 model-00008-of-000163.safetensors
-rw-r--r-- 1 root root 4302350190 Mar  8 23:53 model-00009-of-000163.safetensors
-rw-r--r-- 1 root root 4302383960 Mar  8 23:53 model-00010-of-000163.safetensors
-rw-r--r-- 1 root root 4302384375 Mar  8 23:53 model-00011-of-000163.safetensors
-rw-r--r-- 1 root root 1321583941 Mar  8 23:53 model-00012-of-000163.safetensors
-rw-r--r-- 1 root root 4302317244 Mar  8 23:53 model-00013-of-000163.safetensors
-rw-r--r-- 1 root root 4302384328 Mar  8 23:53 model-00014-of-000163.safetensors
-rw-r--r-- 1 root root 4302350218 Mar  8 23:53 model-00015-of-000163.safetensors
-rw-r--r-- 1 root root 4302383932 Mar  8 23:53 model-00016-of-000163.safetensors
-rw-r--r-- 1 root root 4302384377 Mar  8 23:53 model-00017-of-000163.safetensors
-rw-r--r-- 1 root root 4302350026 Mar  8 23:54 model-00018-of-000163.safetensors
-rw-r--r-- 1 root root 4302384124 Mar  8 23:54 model-00019-of-000163.safetensors
-rw-r--r-- 1 root root 4302384377 Mar  8 23:54 model-00020-of-000163.safetensors
-rw-r--r-- 1 root root 4302350413 Mar  8 23:54 model-00021-of-000163.safetensors
-rw-r--r-- 1 root root 4302384900 Mar  8 23:54 model-00022-of-000163.safetensors
-rw-r--r-- 1 root root 4302350808 Mar  8 23:54 model-00023-of-000163.safetensors
-rw-r--r-- 1 root root 4302384504 Mar  8 23:54 model-00024-of-000163.safetensors
-rw-r--r-- 1 root root 4302384961 Mar  8 23:54 model-00025-of-000163.safetensors
-rw-r--r-- 1 root root 4302350620 Mar  8 23:54 model-00026-of-000163.safetensors
-rw-r--r-- 1 root root 4302384692 Mar  8 23:54 model-00027-of-000163.safetensors
-rw-r--r-- 1 root root 4302384963 Mar  8 23:55 model-00028-of-000163.safetensors
-rw-r--r-- 1 root root 4302350448 Mar  8 23:55 model-00029-of-000163.safetensors
-rw-r--r-- 1 root root 4302384884 Mar  8 23:55 model-00030-of-000163.safetensors
-rw-r--r-- 1 root root 4302350824 Mar  8 23:55 model-00031-of-000163.safetensors
-rw-r--r-- 1 root root 4302384488 Mar  8 23:55 model-00032-of-000163.safetensors
-rw-r--r-- 1 root root 4302384963 Mar  8 23:55 model-00033-of-000163.safetensors
-rw-r--r-- 1 root root 1747417474 Mar  8 23:55 model-00034-of-000163.safetensors
-rw-r--r-- 1 root root 4302317817 Mar  8 23:55 model-00035-of-000163.safetensors
-rw-r--r-- 1 root root 4302384914 Mar  8 23:55 model-00036-of-000163.safetensors
-rw-r--r-- 1 root root 4302350794 Mar  8 23:55 model-00037-of-000163.safetensors
-rw-r--r-- 1 root root 4302384518 Mar  8 23:56 model-00038-of-000163.safetensors
-rw-r--r-- 1 root root 4302384963 Mar  8 23:56 model-00039-of-000163.safetensors
-rw-r--r-- 1 root root 4302350602 Mar  8 23:56 model-00040-of-000163.safetensors
-rw-r--r-- 1 root root 4302384710 Mar  8 23:56 model-00041-of-000163.safetensors
-rw-r--r-- 1 root root 4302384963 Mar  8 23:56 model-00042-of-000163.safetensors
-rw-r--r-- 1 root root 4302350432 Mar  8 23:56 model-00043-of-000163.safetensors
-rw-r--r-- 1 root root 4302384900 Mar  8 23:56 model-00044-of-000163.safetensors
-rw-r--r-- 1 root root 4302350808 Mar  8 23:56 model-00045-of-000163.safetensors
-rw-r--r-- 1 root root 4302384504 Mar  8 23:56 model-00046-of-000163.safetensors
-rw-r--r-- 1 root root 4302384961 Mar  8 23:57 model-00047-of-000163.safetensors
-rw-r--r-- 1 root root 4302350620 Mar  8 23:57 model-00048-of-000163.safetensors
-rw-r--r-- 1 root root 4302384692 Mar  8 23:57 model-00049-of-000163.safetensors
-rw-r--r-- 1 root root 4302384963 Mar  8 23:57 model-00050-of-000163.safetensors
-rw-r--r-- 1 root root 4302350448 Mar  8 23:57 model-00051-of-000163.safetensors
-rw-r--r-- 1 root root 4302384884 Mar  8 23:57 model-00052-of-000163.safetensors
-rw-r--r-- 1 root root 4302350824 Mar  8 23:57 model-00053-of-000163.safetensors
-rw-r--r-- 1 root root 4302384488 Mar  8 23:57 model-00054-of-000163.safetensors
-rw-r--r-- 1 root root 4302384963 Mar  8 23:57 model-00055-of-000163.safetensors
-rw-r--r-- 1 root root 1747417474 Mar  8 23:57 model-00056-of-000163.safetensors
-rw-r--r-- 1 root root 4302317817 Mar  8 23:58 model-00057-of-000163.safetensors
-rw-r--r-- 1 root root 4302384914 Mar  8 23:58 model-00058-of-000163.safetensors
-rw-r--r-- 1 root root 4302350794 Mar  8 23:58 model-00059-of-000163.safetensors
-rw-r--r-- 1 root root 4302384518 Mar  8 23:58 model-00060-of-000163.safetensors
-rw-r--r-- 1 root root 4302384963 Mar  8 23:58 model-00061-of-000163.safetensors
-rw-r--r-- 1 root root 4302350602 Mar  8 23:58 model-00062-of-000163.safetensors
-rw-r--r-- 1 root root 4302384710 Mar  8 23:58 model-00063-of-000163.safetensors
-rw-r--r-- 1 root root 4302384963 Mar  8 23:58 model-00064-of-000163.safetensors
-rw-r--r-- 1 root root 4302350432 Mar  8 23:58 model-00065-of-000163.safetensors
-rw-r--r-- 1 root root 4302384900 Mar  8 23:58 model-00066-of-000163.safetensors
-rw-r--r-- 1 root root 4302350808 Mar  8 23:59 model-00067-of-000163.safetensors
-rw-r--r-- 1 root root 4302384504 Mar  8 23:59 model-00068-of-000163.safetensors
-rw-r--r-- 1 root root 4302384961 Mar  8 23:59 model-00069-of-000163.safetensors
-rw-r--r-- 1 root root 4302350620 Mar  8 23:59 model-00070-of-000163.safetensors
-rw-r--r-- 1 root root 4302384692 Mar  8 23:59 model-00071-of-000163.safetensors
-rw-r--r-- 1 root root 4302384963 Mar  8 23:59 model-00072-of-000163.safetensors
-rw-r--r-- 1 root root 4302350448 Mar  8 23:59 model-00073-of-000163.safetensors
-rw-r--r-- 1 root root 4302384884 Mar  8 23:59 model-00074-of-000163.safetensors
-rw-r--r-- 1 root root 4302350824 Mar  8 23:59 model-00075-of-000163.safetensors
-rw-r--r-- 1 root root 4302384488 Mar  9 00:00 model-00076-of-000163.safetensors
-rw-r--r-- 1 root root 4302384963 Mar  9 00:00 model-00077-of-000163.safetensors
-rw-r--r-- 1 root root 1747417474 Mar  9 00:00 model-00078-of-000163.safetensors
-rw-r--r-- 1 root root 4302317817 Mar  9 00:00 model-00079-of-000163.safetensors
-rw-r--r-- 1 root root 4302384914 Mar  9 00:00 model-00080-of-000163.safetensors
-rw-r--r-- 1 root root 4302350794 Mar  9 00:00 model-00081-of-000163.safetensors
-rw-r--r-- 1 root root 4302384518 Mar  9 00:00 model-00082-of-000163.safetensors
-rw-r--r-- 1 root root 4302384963 Mar  9 00:00 model-00083-of-000163.safetensors
-rw-r--r-- 1 root root 4302350602 Mar  9 00:00 model-00084-of-000163.safetensors
-rw-r--r-- 1 root root 4302384710 Mar  9 00:00 model-00085-of-000163.safetensors
-rw-r--r-- 1 root root 4302384963 Mar  9 00:00 model-00086-of-000163.safetensors
-rw-r--r-- 1 root root 4302350432 Mar  9 00:01 model-00087-of-000163.safetensors
-rw-r--r-- 1 root root 4302384900 Mar  9 00:01 model-00088-of-000163.safetensors
-rw-r--r-- 1 root root 4302350808 Mar  9 00:01 model-00089-of-000163.safetensors
-rw-r--r-- 1 root root 4302384504 Mar  9 00:01 model-00090-of-000163.safetensors
-rw-r--r-- 1 root root 4302384961 Mar  9 00:01 model-00091-of-000163.safetensors
-rw-r--r-- 1 root root 4302350620 Mar  9 00:01 model-00092-of-000163.safetensors
-rw-r--r-- 1 root root 4302384692 Mar  9 00:01 model-00093-of-000163.safetensors
-rw-r--r-- 1 root root 4302384963 Mar  9 00:01 model-00094-of-000163.safetensors
-rw-r--r-- 1 root root 4302350448 Mar  9 00:01 model-00095-of-000163.safetensors
-rw-r--r-- 1 root root 4302384884 Mar  9 00:02 model-00096-of-000163.safetensors
-rw-r--r-- 1 root root 4302350824 Mar  9 00:02 model-00097-of-000163.safetensors
-rw-r--r-- 1 root root 4302384488 Mar  9 00:02 model-00098-of-000163.safetensors
-rw-r--r-- 1 root root 4302384963 Mar  9 00:02 model-00099-of-000163.safetensors
-rw-r--r-- 1 root root 1747417474 Mar  9 00:02 model-00100-of-000163.safetensors
-rw-r--r-- 1 root root 4302317817 Mar  9 00:02 model-00101-of-000163.safetensors
-rw-r--r-- 1 root root 4302384914 Mar  9 00:02 model-00102-of-000163.safetensors
-rw-r--r-- 1 root root 4302350794 Mar  9 00:02 model-00103-of-000163.safetensors
-rw-r--r-- 1 root root 4302384518 Mar  9 00:02 model-00104-of-000163.safetensors
-rw-r--r-- 1 root root 4302384963 Mar  9 00:02 model-00105-of-000163.safetensors
-rw-r--r-- 1 root root 4302350602 Mar  9 00:03 model-00106-of-000163.safetensors
-rw-r--r-- 1 root root 4302384710 Mar  9 00:03 model-00107-of-000163.safetensors
-rw-r--r-- 1 root root 4302384963 Mar  9 00:03 model-00108-of-000163.safetensors
-rw-r--r-- 1 root root 4302350432 Mar  9 00:03 model-00109-of-000163.safetensors
-rw-r--r-- 1 root root 4302384900 Mar  9 00:03 model-00110-of-000163.safetensors
-rw-r--r-- 1 root root 4302350808 Mar  9 00:03 model-00111-of-000163.safetensors
-rw-r--r-- 1 root root 4302384504 Mar  9 00:03 model-00112-of-000163.safetensors
-rw-r--r-- 1 root root 4302384961 Mar  9 00:03 model-00113-of-000163.safetensors
-rw-r--r-- 1 root root 4302350620 Mar  9 00:03 model-00114-of-000163.safetensors
-rw-r--r-- 1 root root 4302384692 Mar  9 00:03 model-00115-of-000163.safetensors
-rw-r--r-- 1 root root 4302384963 Mar  9 00:04 model-00116-of-000163.safetensors
-rw-r--r-- 1 root root 4302350448 Mar  9 00:04 model-00117-of-000163.safetensors
-rw-r--r-- 1 root root 4302384884 Mar  9 00:04 model-00118-of-000163.safetensors
-rw-r--r-- 1 root root 4302350824 Mar  9 00:04 model-00119-of-000163.safetensors
-rw-r--r-- 1 root root 4302384488 Mar  9 00:04 model-00120-of-000163.safetensors
-rw-r--r-- 1 root root 4302384963 Mar  9 00:04 model-00121-of-000163.safetensors
-rw-r--r-- 1 root root 1747417474 Mar  9 00:04 model-00122-of-000163.safetensors
-rw-r--r-- 1 root root 4302317817 Mar  9 00:04 model-00123-of-000163.safetensors
-rw-r--r-- 1 root root 4302384914 Mar  9 00:04 model-00124-of-000163.safetensors
-rw-r--r-- 1 root root 4302350794 Mar  9 00:04 model-00125-of-000163.safetensors
-rw-r--r-- 1 root root 4302384518 Mar  9 00:05 model-00126-of-000163.safetensors
-rw-r--r-- 1 root root 4302384963 Mar  9 00:05 model-00127-of-000163.safetensors
-rw-r--r-- 1 root root 4302350602 Mar  9 00:05 model-00128-of-000163.safetensors
-rw-r--r-- 1 root root 4302384710 Mar  9 00:05 model-00129-of-000163.safetensors
-rw-r--r-- 1 root root 4302384963 Mar  9 00:05 model-00130-of-000163.safetensors
-rw-r--r-- 1 root root 4302350432 Mar  9 00:05 model-00131-of-000163.safetensors
-rw-r--r-- 1 root root 4302384900 Mar  9 00:05 model-00132-of-000163.safetensors
-rw-r--r-- 1 root root 4302350808 Mar  9 00:05 model-00133-of-000163.safetensors
-rw-r--r-- 1 root root 4302384504 Mar  9 00:05 model-00134-of-000163.safetensors
-rw-r--r-- 1 root root 4302384961 Mar  9 00:05 model-00135-of-000163.safetensors
-rw-r--r-- 1 root root 4302350620 Mar  9 00:06 model-00136-of-000163.safetensors
-rw-r--r-- 1 root root 4302384692 Mar  9 00:06 model-00137-of-000163.safetensors
-rw-r--r-- 1 root root 4302384963 Mar  9 00:06 model-00138-of-000163.safetensors
-rw-r--r-- 1 root root 4302350448 Mar  9 00:06 model-00139-of-000163.safetensors
-rw-r--r-- 1 root root 4302384884 Mar  9 00:06 model-00140-of-000163.safetensors
-rw-r--r-- 1 root root 3142388798 Mar  9 00:06 model-00141-of-000163.safetensors
-rw-r--r-- 1 root root 4302317817 Mar  9 00:06 model-00142-of-000163.safetensors
-rw-r--r-- 1 root root 4302384914 Mar  9 00:06 model-00143-of-000163.safetensors
-rw-r--r-- 1 root root 4302350794 Mar  9 00:06 model-00144-of-000163.safetensors
-rw-r--r-- 1 root root 4302384518 Mar  9 00:06 model-00145-of-000163.safetensors
-rw-r--r-- 1 root root 4302384963 Mar  9 00:07 model-00146-of-000163.safetensors
-rw-r--r-- 1 root root 4302350602 Mar  9 00:07 model-00147-of-000163.safetensors
-rw-r--r-- 1 root root 4302384710 Mar  9 00:07 model-00148-of-000163.safetensors
-rw-r--r-- 1 root root 4302384963 Mar  9 00:07 model-00149-of-000163.safetensors
-rw-r--r-- 1 root root 4302350432 Mar  9 00:07 model-00150-of-000163.safetensors
-rw-r--r-- 1 root root 4302384900 Mar  9 00:07 model-00151-of-000163.safetensors
-rw-r--r-- 1 root root 4302350808 Mar  9 00:07 model-00152-of-000163.safetensors
-rw-r--r-- 1 root root 4302384504 Mar  9 00:07 model-00153-of-000163.safetensors
-rw-r--r-- 1 root root 4302384961 Mar  9 00:07 model-00154-of-000163.safetensors
-rw-r--r-- 1 root root 4302350620 Mar  9 00:08 model-00155-of-000163.safetensors
-rw-r--r-- 1 root root 4302384692 Mar  9 00:08 model-00156-of-000163.safetensors
-rw-r--r-- 1 root root 4302384963 Mar  9 00:08 model-00157-of-000163.safetensors
-rw-r--r-- 1 root root 4302350448 Mar  9 00:08 model-00158-of-000163.safetensors
-rw-r--r-- 1 root root 4302384884 Mar  9 00:08 model-00159-of-000163.safetensors
-rw-r--r-- 1 root root 5230637362 Mar  9 00:08 model-00160-of-000163.safetensors
-rw-r--r-- 1 root root 4302384321 Mar  9 00:08 model-00161-of-000163.safetensors
-rw-r--r-- 1 root root 4302384948 Mar  9 00:08 model-00162-of-000163.safetensors
-rw-r--r-- 1 root root 6584784447 Mar  9 00:08 model-00163-of-000163.safetensors
-rw-r--r-- 1 root root    8898324 Mar  9 00:08 model.safetensors.index.json
-rw-r--r-- 1 root root    7847602 Mar  9 00:08 tokenizer.json
-rw-r--r-- 1 root root       3584 Mar  9 00:08 tokenizer_config.json

Jeffwan avatar Mar 09 '25 08:03 Jeffwan

I initially use following configuration but https://huggingface.co/deepseek-ai/DeepSeek-R1/tree/main has a important files named configuration_deepseek.py,

                - name: DOWNLOADER_ALLOW_FILE_SUFFIX
                  value: json, safetensors

Image

change the value to json, safetensors, py resolves the issue. I would say this is a misconfiguration issue but there're things to improve, we should use ignore pattern to block .bin etc instead of whitelist way

Jeffwan avatar Mar 09 '25 09:03 Jeffwan

When executing this command, the $KUBERAY_GEN_RAY_START_CMD blocks the process, preventing the subsequent execution of the vllm serve command: vllm serve /models/deepseek-r1 --trust-remote-code --served-model-name deepseek-r1-671b --tensor-parallel-size 16 --distributed-executor-backend ray --uvicorn-log-level warning";so, it will be "ulimit -n 65536; echo head; nohup $KUBERAY_GEN_RAY_START_CMD > /tmp/ray_start.log 2>&1 & vllm serve /models/deepseek-r1 --trust-remote-code --served-model-name deepseek-r1-671b --tensor-parallel-size 16 --distributed-executor-backend ray --uvicorn-log-level warning"

ying2025 avatar Mar 10 '25 03:03 ying2025

@ying2025 which kuberay version are you using?

Jeffwan avatar Mar 10 '25 04:03 Jeffwan

ray, version 2.42.0

ying2025 avatar Mar 10 '25 06:03 ying2025

  1. Did you add block: 'false' in the rayStartParams? this is required to remove the --block in startup command
  2. Underneath operator kuberay has a bug on disabling --block and we fixed it and built an image aibrix/kuberay-operator:v1.2.1-patch. We work with upstream to bring that change back soon. feel free to confirm you are using this version. You can run kubectl describe deployment aibrix-kuberay-operator -n aibrix-system to verify it.

Some docs might be outdated. feel free to check docs here https://github.com/vllm-project/aibrix/blob/main/samples/deepseek-r1/deepseek-r1-huggingface.yaml as an example

Jeffwan avatar Mar 10 '25 07:03 Jeffwan

  1. Did you add block: 'false' in the rayStartParams? this is required to remove the --block in startup command
  2. Underneath operator kuberay has a bug on disabling --block and we fixed it and built an image aibrix/kuberay-operator:v1.2.1-patch. We work with upstream to bring that change back soon. feel free to confirm you are using this version. You can run kubectl describe deployment aibrix-kuberay-operator -n aibrix-system to verify it.

Some docs might be outdated. feel free to check docs here https://github.com/vllm-project/aibrix/blob/main/samples/deepseek-r1/deepseek-r1-huggingface.yaml as an exampl

ok, When I update the kuberay-operator and add block: 'false' in the rayStartParams it's ok.

ying2025 avatar Mar 12 '25 06:03 ying2025

@ying2025 thanks for the confirmation

Jeffwan avatar Mar 12 '25 07:03 Jeffwan

2. v1.2.1-patch

Will this code change be merged into ray, or can you provide the relevant code?

ying2025 avatar Apr 07 '25 07:04 ying2025

@ying2025 Yeah, it will be part of kuberay. I am asking one engineer to help with it. here's the code branch https://github.com/ray-project/kuberay/commit/91e1c26fbf1fc0f505ff7d16b70cf8228ed62ec4#diff-cc9abb27aaceca3f10193e2ab35fb00dca44b8858709c5c0f4df751c1387291aR576 and original issue https://github.com/vllm-project/aibrix/issues/245#issuecomment-2394811082

Jeffwan avatar Apr 07 '25 20:04 Jeffwan

@ying2025 Yeah, it will be part of kuberay. I am asking one engineer to help with it. here's the code branch ray-project/kuberay@91e1c26#diff-cc9abb27aaceca3f10193e2ab35fb00dca44b8858709c5c0f4df751c1387291aR576 and original issue #245 (comment)

ok, thanks

ying2025 avatar Apr 08 '25 01:04 ying2025