Jimmy Zhang
Jimmy Zhang
[get_tensor_parallel_group](https://github.com/oyilmaz-nvidia/NeMo/blob/export-cleanup/nemo/export/trt_llm/tensor_utils.py#L47C5-L47C30) is not used anymore. I think [this one too](https://github.com/oyilmaz-nvidia/NeMo/blob/export-cleanup/nemo/export/trt_llm/tensor_utils.py#L24C5-L24C30):
I believe [cpu_map_location](https://github.com/oyilmaz-nvidia/NeMo/blob/export-cleanup/nemo/export/trt_llm/utils.py#L63) is duplicated with [this](https://github.com/oyilmaz-nvidia/NeMo/blob/export-cleanup/nemo/export/trt_llm/nemo/convert.py#L26) Also it seems to me `cpu_map_location` and `gpu_map_location` should be in `trt_llm/nemo/nemo.py` since thats where they're only used
[This function ](https://github.com/oyilmaz-nvidia/NeMo/blob/export-cleanup/nemo/export/trt_llm/nemo/convert.py#L418) is only used by `build` and `refit` so it can be removed, eventually though we'll have to add it back as well when `build` and `refit` are...
it looks like a mpi bootstrap issue, previously this code path worked, so im not sure what changed- probably we can just switch to nccl or gloo bootstrap