VectorDBBench icon indicating copy to clipboard operation
VectorDBBench copied to clipboard

Cannot run vectordb-bench in offline env

Open cydrain opened this issue 2 months ago • 6 comments

Meet some issues when run vectordb-bench in offline env.

Reproduce steps:

  1. download cohere_medium_1m dataset
  2. run milvus standalone
  3. install vectordb-bench official pip package
pip install vectordb-bench
pip install vectordb-bench milvus
  1. run "init_bench", and then run task, flow will error out with this traceback:
During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/caiyd/miniconda3/envs/vdb/lib/python3.11/site-packages/vectordb_bench/interface.py", line 182, in _async_task_v2
    case_res.metrics = runner.run(drop_old)
                       ^^^^^^^^^^^^^^^^^^^^
  File "/home/caiyd/miniconda3/envs/vdb/lib/python3.11/site-packages/vectordb_bench/backend/task_runner.py", line 120, in run
    self._pre_run(drop_old)
  File "/home/caiyd/miniconda3/envs/vdb/lib/python3.11/site-packages/vectordb_bench/backend/task_runner.py", line 112, in _pre_run
    self.ca.dataset.prepare(self.dataset_source, filters=self.ca.filters)
  File "/home/caiyd/miniconda3/envs/vdb/lib/python3.11/site-packages/vectordb_bench/backend/dataset.py", line 368, in prepare
    source.reader().read(
  File "/home/caiyd/miniconda3/envs/vdb/lib/python3.11/site-packages/vectordb_bench/backend/data_source.py", line 133, in read
    if (not local_file.exists()) or (not self.validate_file(remote_file, local_file)):
                                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/caiyd/miniconda3/envs/vdb/lib/python3.11/site-packages/vectordb_bench/backend/data_source.py", line 149, in validate_file
    info = self.fs.info(remote)
           ^^^^^^^^^^^^^^^^^^^^
  File "/home/caiyd/miniconda3/envs/vdb/lib/python3.11/site-packages/fsspec/asyn.py", line 118, in wrapper
    return sync(self.loop, func, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  1. want to comment out this line "if (not local_file.exists()) or (not self.validate_file(remote_file, local_file)):", so I uninstall vectordb-bench, and install it locally, but it cannot start up
(vdb)  ~/work/zilliz/VectorDBBench/ [tags/v1.0.10] pip uninstall -y vectordb-bench               
Found existing installation: vectordb-bench 1.0.11
Uninstalling vectordb-bench-1.0.11:
  Successfully uninstalled vectordb-bench-1.0.11
(vdb)  ~/work/zilliz/VectorDBBench/ [tags/v1.0.10] 
(vdb)  ~/work/zilliz/VectorDBBench/ [tags/v1.0.10] pip cache purge                
Files removed: 286 (16.3 MB)
(vdb)  ~/work/zilliz/VectorDBBench/ [tags/v1.0.10] 
(vdb)  ~/work/zilliz/VectorDBBench/ [tags/v1.0.10] pip install -e ".[milvus]"
Obtaining file:///home/caiyd/work/zilliz/VectorDBBench
  Installing build dependencies ... done
......
55130281d18c9ffcdf77cae9
  Stored in directory: /tmp/pip-ephem-wheel-cache-n7c1wn_2/wheels/ac/03/a2/762233e655726a69db9d9b05989bd48a40b5ae020618dc4590
Successfully built vectordb-bench
Installing collected packages: vectordb-bench
Successfully installed vectordb-bench-1.0.10
(vdb)  ~/work/zilliz/VectorDBBench/ [tags/v1.0.10] 
(vdb)  ~/work/zilliz/VectorDBBench/ [tags/v1.0.10] pip list | grep vectordb-bench
130:vectordb-bench              1.0.10      /home/caiyd/work/zilliz/VectorDBBench
(vdb)  ~/work/zilliz/VectorDBBench/ [tags/v1.0.10] 
(vdb)  ~/work/zilliz/VectorDBBench/ [tags/v1.0.10] init_bench
Traceback (most recent call last):
  File "/home/caiyd/miniconda3/envs/vdb/bin/init_bench", line 3, in <module>
    from vectordb_bench.__main__ import main
  File "/home/caiyd/work/zilliz/VectorDBBench/vectordb_bench/__main__.py", line 6, in <module>
    from . import config
ImportError: cannot import name 'config' from 'vectordb_bench' (unknown location)
(vdb)  ~/work/zilliz/VectorDBBench/ [tags/v1.0.10] 

cydrain avatar Oct 24 '25 04:10 cydrain

Image Try this instead of init_bench plz: ``` python -m vectordb_bench ```

XuanYang-cn avatar Oct 24 '25 07:10 XuanYang-cn

Looks like env issue. Create a new conda env, and install with command "pip install -e .[milvus]", this issue is gone.

Thanks

cydrain avatar Oct 24 '25 09:10 cydrain

@XuanYang-cn As I mentioned in this issue, when running vectordb-bench in an offline environment, it hangs at this line:

if (not local_file.exists()) or (not self.validate_file(remote_file, local_file)):

It would be great to have a parameter, say “--offline_run”, that skips such network-dependent API calls.

cydrain avatar Oct 25 '25 01:10 cydrain

Image Try this instead of init_bench plz: python -m vectordb_bench

one more question, what's the different between "python -m vectordb_bench" and "init_bench" ?

cydrain avatar Oct 25 '25 01:10 cydrain

Image Try this instead of init_bench plz: python -m vectordb_bench

one more question, what's the different between "python -m vectordb_bench" and "init_bench" ?

All these commands ultimately execute the same entry point defined in main.py.

However, there’s an important difference in how the Python environment is chosen:

init_bench is a console script installed by pip. Its path is hard-coded to the Python interpreter used during installation, and the system locates it based on your PATH order. This means if you have multiple environments, init_bench will always run with the interpreter from the environment it was originally installed in — regardless of which environment is currently active.

python -m vectordb_bench, on the other hand, explicitly uses the Python interpreter you invoke. For example, running

python3.10 -m vectordb_bench

will execute the module inside Python 3.10’s environment, while

python3.13 -m vectordb_bench

runs it in Python 3.13’s environment.

This approach provides better isolation and reproducibility when you work with multiple Python versions or virtual environments, ensuring that the correct dependencies and configurations are used for each interpreter.

XuanYang-cn avatar Oct 27 '25 02:10 XuanYang-cn

@XuanYang-cn In that case, my biggest request is to have a command-line option that lets me skip the file-validity check, because in an offline environment the process exits when that API call "self.validate_file()" fails.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/caiyd/miniconda3/envs/vdb/lib/python3.11/site-packages/vectordb_bench/interface.py", line 182, in _async_task_v2
    case_res.metrics = runner.run(drop_old)
                       ^^^^^^^^^^^^^^^^^^^^
  File "/home/caiyd/miniconda3/envs/vdb/lib/python3.11/site-packages/vectordb_bench/backend/task_runner.py", line 120, in run
    self._pre_run(drop_old)
  File "/home/caiyd/miniconda3/envs/vdb/lib/python3.11/site-packages/vectordb_bench/backend/task_runner.py", line 112, in _pre_run
    self.ca.dataset.prepare(self.dataset_source, filters=self.ca.filters)
  File "/home/caiyd/miniconda3/envs/vdb/lib/python3.11/site-packages/vectordb_bench/backend/dataset.py", line 368, in prepare
    source.reader().read(
  File "/home/caiyd/miniconda3/envs/vdb/lib/python3.11/site-packages/vectordb_bench/backend/data_source.py", line 133, in read
    if (not local_file.exists()) or (not self.validate_file(remote_file, local_file)):
                                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/caiyd/miniconda3/envs/vdb/lib/python3.11/site-packages/vectordb_bench/backend/data_source.py", line 149, in validate_file
    info = self.fs.info(remote)
           ^^^^^^^^^^^^^^^^^^^^
  File "/home/caiyd/miniconda3/envs/vdb/lib/python3.11/site-packages/fsspec/asyn.py", line 118, in wrapper
    return sync(self.loop, func, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

cydrain avatar Nov 06 '25 02:11 cydrain