Vitaliy Chiley

Results 64 comments of Vitaliy Chiley

Yes the current setup depends on torch==1.13.1 [torch2 has some issues](https://github.com/mosaicml/llm-foundry/issues/57#issuecomment-1537225586) but we will upgrade when these are resolved (Note: people have run our repo with torch2, but it does...

MPT is a GPT style network You'd want to create a conversion script, similar to [this one](https://github.com/NVIDIA/FasterTransformer/blob/c6e8f60ec40da218804a60e6aa986903e7fa8594/examples/pytorch/gpt/utils/huggingface_gpt_convert.py), for converting the MPT HF model into the FT format. When we write...

[The `*.c_*.*` naming](https://github.com/NVIDIA/FasterTransformer/blob/c6e8f60ec40da218804a60e6aa986903e7fa8594/examples/pytorch/gpt/utils/huggingface_gpt_convert.py#L135) makes me think they use 1x1 conv layers instead of linear layers (functionally the same thing, for some reason early transformer implementations use to do this; e.g....

> Transformer Engine We've played around with [TE and H100 FP8 support](https://www.mosaicml.com/blog/coreweave-nvidia-h100-part-1) It works and we'll include everything when we have more seat time with H100s so we can test...

> change triton_flash_attention to flash_attention in yaml is a solution of this problem~ flash attn does not support alibi. you're technically not running the correct model...

What version of torch are you using? are you following the [requirements](https://github.com/mosaicml/llm-foundry#prerequisites) and installation instructions? (which say use torch1.13) (this is an issue you would have if you are using...

the triton installed with torch2 made breaking changes to the triton version we need for our training setup. please use torch1.13 as instructed in the [requirements](https://github.com/mosaicml/llm-foundry#prerequisites). torch2 is NOT supported...

[torch2 now works](https://github.com/mosaicml/llm-foundry/pull/149) 🥳 Note: our setup does install 2 versions of triton, please follow the install instructions Closing issue, if you still have issues, feel free to re-open the...

What setup are you using? is it multi-gpu? `composer SCRIPT` will launch the script across multiple GPUs; `composer SCRIPT` will not, but if there are multiple gpus in your setup...

see https://github.com/mosaicml/llm-foundry/issues/143#issuecomment-1553334904