llm-sharp
llm-sharp copied to clipboard
Flash Attention and native PyTorch weights
I saw on the TODO list Flash Attention, so I wanted to bring to your attention the announcement here.
Two packages were announced there:
1] Loading model weights saved using the PyTorch format / safetensors format (including handling for HuggingFace's sharding)
2] Flash Attention - self explanatory :)
Also, I saw in one of the python scripts that you rename some of the weights to match the naming scheme between HuggingFace and llm-sharp. There is a useful attribute that you can add on any field to specify the name you want torch to store it as:
[ComponentName("some_name")]
private Module _someOtherName;