llm-foundry icon indicating copy to clipboard operation
llm-foundry copied to clipboard

Windows support ?

Open deepbeepmeep opened this issue 1 year ago • 6 comments

Hello I have been unable to run the model on Windows since the install fails as it requires Triton that is only supported on Linux.

Any idea ?

Thanks in advance

deepbeepmeep avatar May 08 '23 20:05 deepbeepmeep

We haven't tested the model on Windows and strongly suggest you use Linux. The most convenient option is to use our recommended Ubuntu docker image.

However, if you'd like to try, you can comment out the Triton dependency and set attn_impl: torch to use the native PyTorch attention implementation.

nik-mosaic avatar May 08 '23 22:05 nik-mosaic

thanks. How do you set "attn_impl: torch" ?

deepbeepmeep avatar May 09 '23 15:05 deepbeepmeep

ok I have figured how to do it.

Here are the results:

  • with Windows, it hangs forever after 'the "Warming up.." message. The reason could be that it is slow because the CPU is forced to be used.

  • with Linux (Windows / WSL) the initialization is a few minutes and then it proceeds

Is there a reason why the init is so slow ? I use a M2 drive with i713700 / RTX 4090.

Moreover it seems Triton can not be used on Linux with my config because ADA (sm_89) is not supported by Triton

deepbeepmeep avatar May 09 '23 15:05 deepbeepmeep

Fyi Triton 2.1 supports ADA while the current version you use (2.0) doesn't. llm foundry won't le met use Triton 2.1 since version 2.0 is hardcoded.

deepbeepmeep avatar May 09 '23 16:05 deepbeepmeep

If WSL works for you, we recommend you use WSL. We unfortunately do not have the resources to test this repository on Windows.

If you'd like to upgrade the Triton version pin, you can try editing setup.py manually, but this may not work.

nik-mosaic avatar May 09 '23 17:05 nik-mosaic

Triton 2.0 is hard coded inside the Hugging Face transfomer code. So there is no way I can change it. Besides my main problem right now is that the initialization takes forever (a few minutes) on my high end rig. Not much is going except that one CPU core is 100%. Is this expected ? anyway to fix that ?

deepbeepmeep avatar May 11 '23 21:05 deepbeepmeep

Hello @deepbeepmeep , unfortunately we do not have a lot of experience with Windows to help diagnose here. If WSL works, I echo @nik-mosaic that we would recommend sticking to that.

Could you point to where is Triton 2.0 hardcoded in HuggingFace? You should be able to override the install after the transformers install with pip install triton==2.1.

hanlint avatar May 19 '23 18:05 hanlint