llama-stack
llama-stack copied to clipboard
FP8 Quantization Does Not Work
Trying to run inference with FP8 quantization, and got the following error:
Configuring API surface: inference
Enter value for model (existing: Meta-Llama3.1-8B-Instruct) (required): Meta-Llama3.1-8B-Instruct
Enter value for quantization (optional): fp8
Enter value for torch_seed (optional):
Enter value for max_seq_len (existing: 4096) (required): 4096
Enter value for max_batch_size (existing: 1) (required): 1
Traceback (most recent call last):
File "/home/ubuntu/miniforge3/envs/local-llama-8b/lib/python3.10/runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/home/ubuntu/miniforge3/envs/local-llama-8b/lib/python3.10/runpy.py", line 86, in _run_code
exec(code, run_globals)
File "/home/ubuntu/taoz/llama-stack/llama_toolchain/cli/llama.py", line 58, in <module>
main()
File "/home/ubuntu/taoz/llama-stack/llama_toolchain/cli/llama.py", line 54, in main
parser.run(args)
File "/home/ubuntu/taoz/llama-stack/llama_toolchain/cli/llama.py", line 48, in run
args.func(args)
File "/home/ubuntu/taoz/llama-stack/llama_toolchain/cli/distribution/configure.py", line 59, in _run_distribution_configure_cmd
configure_llama_distribution(dist, config)
File "/home/ubuntu/taoz/llama-stack/llama_toolchain/cli/distribution/configure.py", line 86, in configure_llama_distribution
provider_config = prompt_for_config(
File "/home/ubuntu/taoz/llama-stack/llama_toolchain/common/prompt_for_config.py", line 252, in prompt_for_config
return config_type(**config_data)
File "/home/ubuntu/miniforge3/envs/local-llama-8b/lib/python3.10/site-packages/pydantic/main.py", line 193, in __init__
self.__pydantic_validator__.validate_python(data, self_instance=self)
pydantic_core._pydantic_core.ValidationError: 1 validation error for MetaReferenceImplConfig
quantization
Input should be a valid dictionary or object to extract fields from [type=model_attributes_type, input_value='fp8', input_type=str]
For further information visit https://errors.pydantic.dev/2.8/v/model_attributes_type
Error occurred in script at line: 112
Failed to install distribution local
Traceback (most recent call last):
File "/home/ubuntu/miniforge3/envs/llama-stack/bin/llama", line 8, in <module>
sys.exit(main())
File "/home/ubuntu/taoz/llama-stack/llama_toolchain/cli/llama.py", line 54, in main
parser.run(args)
File "/home/ubuntu/taoz/llama-stack/llama_toolchain/cli/llama.py", line 48, in run
args.func(args)
File "/home/ubuntu/taoz/llama-stack/llama_toolchain/cli/distribution/install.py", line 105, in _run_distribution_install_cmd
assert return_code == 0, cprint(
AssertionError: None