Derya Cavdar
Results
2
issues of
Derya Cavdar
Hi, What is the correct `fp8_group` when using FSDP and tensor parallelism together? Is it all gpus or between tensor parallel groups? Thanks.
documentation
Hi, Is there a way to enable parallel residual similar to HF GPT-neox [use_parallel_residual](https://github.com/huggingface/transformers/blob/main/src/transformers/models/gpt_neox/configuration_gpt_neox.py#L78C9-L78C30) config to speed up training? @ksivaman If currently not supported do you have any plans to...
enhancement