avianion

Results 13 issues of avianion

Triton inference server supports C++ runtime for Tensorrtllm. But would be great to support also Python runtime

Hello We are trying to use Deepspeed to load LLama 405b across 2 nodes, of 8 x H100 SXM each. We want to shard the model across all 16 gpus...

## Description I am unable to do an identity operation which involves copying out a tensor 1:1 that is float16. It works fine with an int32 and int64 tensor, but...

Plugins
triaged