llama-stack icon indicating copy to clipboard operation
llama-stack copied to clipboard

FP8 Quantization Does Not Work

Open dawenxi-007 opened this issue 1 year ago • 1 comments

Trying to run inference with FP8 quantization, and got the following error:

Configuring API surface: inference
Enter value for model (existing: Meta-Llama3.1-8B-Instruct) (required): Meta-Llama3.1-8B-Instruct
Enter value for quantization (optional): fp8
Enter value for torch_seed (optional):
Enter value for max_seq_len (existing: 4096) (required): 4096
Enter value for max_batch_size (existing: 1) (required): 1
Traceback (most recent call last):
  File "/home/ubuntu/miniforge3/envs/local-llama-8b/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/home/ubuntu/miniforge3/envs/local-llama-8b/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/home/ubuntu/taoz/llama-stack/llama_toolchain/cli/llama.py", line 58, in <module>
    main()
  File "/home/ubuntu/taoz/llama-stack/llama_toolchain/cli/llama.py", line 54, in main
    parser.run(args)
  File "/home/ubuntu/taoz/llama-stack/llama_toolchain/cli/llama.py", line 48, in run
    args.func(args)
  File "/home/ubuntu/taoz/llama-stack/llama_toolchain/cli/distribution/configure.py", line 59, in _run_distribution_configure_cmd
    configure_llama_distribution(dist, config)
  File "/home/ubuntu/taoz/llama-stack/llama_toolchain/cli/distribution/configure.py", line 86, in configure_llama_distribution
    provider_config = prompt_for_config(
  File "/home/ubuntu/taoz/llama-stack/llama_toolchain/common/prompt_for_config.py", line 252, in prompt_for_config
    return config_type(**config_data)
  File "/home/ubuntu/miniforge3/envs/local-llama-8b/lib/python3.10/site-packages/pydantic/main.py", line 193, in __init__
    self.__pydantic_validator__.validate_python(data, self_instance=self)
pydantic_core._pydantic_core.ValidationError: 1 validation error for MetaReferenceImplConfig
quantization
  Input should be a valid dictionary or object to extract fields from [type=model_attributes_type, input_value='fp8', input_type=str]
    For further information visit https://errors.pydantic.dev/2.8/v/model_attributes_type
Error occurred in script at line: 112
Failed to install distribution local
Traceback (most recent call last):
  File "/home/ubuntu/miniforge3/envs/llama-stack/bin/llama", line 8, in <module>
    sys.exit(main())
  File "/home/ubuntu/taoz/llama-stack/llama_toolchain/cli/llama.py", line 54, in main
    parser.run(args)
  File "/home/ubuntu/taoz/llama-stack/llama_toolchain/cli/llama.py", line 48, in run
    args.func(args)
  File "/home/ubuntu/taoz/llama-stack/llama_toolchain/cli/distribution/install.py", line 105, in _run_distribution_install_cmd
    assert return_code == 0, cprint(
AssertionError: None

dawenxi-007 avatar Aug 28 '24 06:08 dawenxi-007