avianion
Results
13
issues of
avianion
Triton inference server supports C++ runtime for Tensorrtllm. But would be great to support also Python runtime
Hello We are trying to use Deepspeed to load LLama 405b across 2 nodes, of 8 x H100 SXM each. We want to shard the model across all 16 gpus...
## Description I am unable to do an identity operation which involves copying out a tensor 1:1 that is float16. It works fine with an int32 and int64 tensor, but...
Plugins
triaged