tensorrtllm_backend
tensorrtllm_backend copied to clipboard
Support bfloat16 LoRa Adaptors
I have a Mistral7B model with fine-tuned LoRa weights with datatype bfloat16.
I ran into issues when attempting to use my adaptors which were compiled for bfloat16
Running the following command in order to convert them to the .npy
format which allows me to follow the example:
python3 hf_lora_convert.py \
-o ${LOCAL_COMPILED_WEIGHTS}/lora/0 \
-i ${LORA_DIR_1} \
--storage-type bfloat16 \
--verbose
Results inthe following error
main(args)
File "/code/hf_lora_convert.py", line 141, in main
convert_hf_model(args.in_file, args.storage_type, args.out_dir)
File "/code/hf_lora_convert.py", line 122, in convert_hf_model
dim=0).unsqueeze(0).to(dtype=str_dtype_to_torch(dtype)).cpu().numpy()
TypeError: Got unsupported ScalarType BFloat16
[TensorRT-LLM] TensorRT-LLM version: 0.9.0.dev2024032600
Traceback (most recent call last):
File "/code/hf_lora_convert.py", line 176, in <module>
likely a limitation of numpy not natively supporting bfloat16
I went ahead and converted to float32 instead just to continue testing (and hoping that precision was maintained)
....
I had some hope that in triton I would be able to still use BFLOAT16 because I see it as a support datatype BF16
When I load the models and configs and send them to the backend (which was compiled for bfloat16)
After calling in triton-inference-server i get the following error
[TensorRT-LLM][ERROR] Assertion failed: Expected lora weights to be the same data type as base model (/app/tensorrt_llm/cpp/tensorrt_llm/runtime/loraUtils.cpp:66)
Is there any way for me to pass in the adaptor weights as bfloat16?
I noticed that TYPE_BF16 is listed here
But it does not seem like pb_utils.triton_string_to_numpy("TYPE_BF16") is able to use it (since numpy does not have bfloat16)
Expected behavior
Documented example of using bfloat models with LoRa adaptors.
actual behavior
Examples of FP16.