teis-e
teis-e
Is this fixed in the latest 0.10.1 ?
@dosu-bot Is this supported in the latest version?
Does it support for LLaMA now?
Got same issue with pre-built 4090 engine.
> I have heard that the architecture of Zephyr is very similar to LLama. Does tensorRT-LLM not work currently on Zephyr? > > I am hoping to understand what makes...
Please add CohereAI!! CohereForAI/c4ai-command-r-plus
Would 4 INT quant with fp16 work with multi-gpu on the 70B version? Has anyone tried it?
Did anyone try this?
@njaramish Thnx!!! Do you know if it possible to build it quantized, since the model only fits quantized on multiple gpus. I tried this: ``` python3 convert_checkpoint.py --model_dir //root/.cache/huggingface/hub/models--Melon--Meta-Llama-3-70B-Instruct-AutoAWQ-4bit/snapshots/dc5cc4388d36c571d18f091e31decd82ab6621ed \...
But I have 3 GPU's is that an issue? 3x 4090