nim-anywhere icon indicating copy to clipboard operation
nim-anywhere copied to clipboard

Does not run on Titan RTX - demands Bfloat16

Open freemansoft opened this issue 7 months ago • 2 comments

Received this message starting the LLM.

ValueError: Bfloat16 is only supported on GPUs with compute capability of at least 8.0. Your NVIDIA TITAN RTX GPU has compute capability 7.5. You can use float16 instead by explicitly setting the`dtype` flag in CLI, for example: --dtype=half.

Is there a way to

  1. Set the desired precision
  2. Select a different model that works for my RTX Titan? What models could be swapped in for LLM_NIM_0_MODEL=meta/llama3-8b-instruct

The model is run with

INFO 07-20 23:02:05.815 ngc_injector.py:146] Profile metadata: tp: 1
INFO 07-20 23:02:05.815 ngc_injector.py:146] Profile metadata: feat_lora: false
INFO 07-20 23:02:05.815 ngc_injector.py:146] Profile metadata: precision: fp16
INFO 07-20 23:02:05.815 ngc_injector.py:146] Profile metadata: llm_engine: vllm
INFO 07-20 23:02:05.815 ngc_injector.py:166] Preparing model workspace. This step might download additional files to run the model.
INFO 07-20 23:02:08.174 ngc_injector.py:172] Model workspace is now ready. It took 2.359 seconds
INFO 07-20 23:02:08.180 llm_engine.py:98] Initializing an LLM engine (v0.4.1) with config: model='/tmp/meta--llama3-8b-instruct-0_0f1rb6', speculative_config=None, tokenizer='/tmp/meta--llama3-8b-instruct-0_0f1rb6', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, tokenizer_revision=None, trust_remote_code=False, dtype=torch.bfloat16, max_seq_len=8192, download_dir=None, load_format=auto, tensor_parallel_size=1, disable_custom_all_reduce=False, quantization=None, enforce_eager=False, kv_cache_dtype=auto, quantization_param_path=None, device_config=cuda, decoding_config=DecodingConfig(guided_decoding_backend='outlines'), seed=0)

Is there any way to change the type? I tried running this with other models that were described as able to run with float16 but the startup seems to always chooses bfloat 16. Ref: https://github.com/NVIDIA/TensorRT-LLM/blob/main/docs/source/performance/perf-overview.md

freemansoft avatar Jul 20 '24 22:07 freemansoft