Logan Hallee
Logan Hallee
Hi! I have nothing to do with mistral but can answer your questions. Gates or routers are always linear layers, even in switch transformers. Regular linear layers, or sets of...
I second this question. The field sorely needs automatic structuring of transformer like diagrams, especially with the introduction of MOE and state space models.  How to do this effectively?...
Parsing to code to generate a diagram would be much better! For example, orienting all the layers correctly from a print of a Pytorch model or something.
Yep, will be happy to help make some tests and docs for each algorithm after we finish commenting the tensor shape at each step.
@mariosasko, @lhoestq, @albertvillanova Hello! Can anyone help? or can you guys suggest who can help with this?
> Hi ! Feel free to download the dataset and create a `Dataset` object with it. > > Then your'll be able to use `push_to_hub()` to upload the dataset to...
Hi @lhoestq and @albertvillanova , just following up about this.
Sure, that makes sense. However, isn't there a size limit to what typical users can push?
> Yes there is a limit, simply let us know by email at datasets [at] huggingface.co - this way we can give you a storage grant also help making sure...
Hi @bj600800 @peiyaoli @lzygitk7 @chaofan520 @tangmeiaoxue1 @sermare @nullland1027 , My group has put together an implementation of ESMC called [ESM++](https://huggingface.co/Synthyra/ESMplusplus_small) that is completely Huggingface compatible. It loads with AutoModel and...