Liger-Kernel
Liger-Kernel copied to clipboard
Support the new Solar architecture
trafficstars
🚀 The feature, motivation and pitch
This model from Upstage is extremely strong for models that fit on a single GPU for training and inference! https://huggingface.co/upstage/solar-pro-preview-instruct. However, it does use a custom architecture solar which is based on Llama/Mistral but modifies the forward pass to add long range residual connections. It would be awesome to support this architecture natively out of the box!
Alternatives
No response
Additional context
Thank you so much for this awesome project :)
I will try to work on it
That would be awesome @vulkomilev