avianion issues

Repositories
Issues
Comments

Results 13 issues of


                                            avianion

Support python runtime

Triton inference server supports C++ runtime for Tensorrtllm. But would be great to support also Python runtime

Multi node multi GPU sharding for inference / training Llama 405B

Hello We are trying to use Deepspeed to load LLama 405b across 2 nodes, of 8 x H100 SXM each. We want to shard the model across all 16 gpus...

Cuda MEMCPY failure with float16 of TensorRT 10.3 Custom Plugin when running inference on H100 NVL x 2

## Description I am unable to do an identity operation which involves copying out a tensor 1:1 that is float16. It works fine with an int32 and int64 tensor, but...

Plugins

triaged