Apurba Bose
Apurba Bose
Addresses #3064
This PR illustrates the use of nccl ops from TRT-LLM for the example `examples/distributed_inference/tensor_parallel_simple_example.py`
This is for scatter_reduce decomposition where include_self=False
The following lines- ``` if (isinstance(input_val, TRTTensor)) and ( input_val.dtype == trt.int8 or input_val.dtype == trt.int32 ): input_val = cast_trt_tensor(ctx, input_val, trt.float32, name) ``` are present in almost all the...
In `torch_tensorrt/dynamo/_compiler.py` requires_full_compilation is not passed to `partitioning.fast_partition` or `partitioning.global_partition`
This PR- 1. Adds an example for parallel rotary embedding 2. Adds logic for complex graph detection 3. Adds a pass for complex graph rewrite in aten_lowering_pass Please note that...
TRT-LLM download utility
This is with reference to #3448. The error currently being faced is that- (div_2, sym_size_int_3669, mul_102, mul_114, reshape_default_3, mul_213, div_113, mul_2962, clone_54, select_1, clone_67, expand_122, slice_16, clone_68, expand_123, mul_4869, expand_184,...