Nanoflow icon indicating copy to clipboard operation
Nanoflow copied to clipboard

FileNotFoundError when running run_llama3.py: missing ../auto_search/8B_search_result_large_btz.json

Open sihyeong opened this issue 3 months ago • 1 comments

Description

I am following the End-to-end Test section in the README and running the project inside the Docker environment described in the README. During execution, the script crashes with a FileNotFoundError when attempting to open ../auto_search/8B_search_result_large_btz.json. It seems the pipeline expects a profiling result file that is not present by default.

Environment

  • Hardware: NVIDIA H100 80G
  • OS: Docker (cuda:12.8.1-cudnn-devel-ubuntu22.04)

Run python run_llama3.py --load_hf_weight

Error Log

/usr/local/lib/python3.10/dist-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you.
  import pynvml  # type: ignore[import]
W0911 02:35:50.046000 375 torch/utils/cpp_extension.py:2425] TORCH_CUDA_ARCH_LIST is not set, all archs for visible cards are included for compilation. 

- W0911 02:35:50.046000 375 torch/utils/cpp_extension.py:2425] If this is not desired, please set os.environ['TORCH_CUDA_ARCH_LIST'] to specific architectures.

...
Traceback (most recent call last):
  File "/root/Nanoflow/entry/run_llama3.py", line 231, in <module>
    test_performance()
  File "/root/Nanoflow/entry/run_llama3.py", line 82, in test_performance
    pipeline.update(decode_inputs, decode_batch_size, profile_result_path="../auto_search/8B_search_result_large_btz.json", use_cuda_graph=True, use_nano_split=True)
  File "/root/Nanoflow/entry/../models/llama3_FlashinferKVCache.py", line 475, in update
    with open(profile_result_path, "r") as f:
FileNotFoundError: [Errno 2] No such file or directory: '../auto_search/8B_search_result_large_btz.json'

sihyeong avatar Sep 11 '25 02:09 sihyeong

You could try test_correctness first. Here test_performance() is designed for testing the performance of our auto search result. For auto search, you need to do profile first, and then run the new_search.py script in Nanoflow-python/autosearch to get a json file as the result. The profile is in the same run_llama3.py script.

Wazrrr avatar Sep 16 '25 06:09 Wazrrr