Shen Zhuoran
Shen Zhuoran
[BeautyNet](https://github.com/cms-flash/beauty-net) is a high-quality, minimalist template for research projects in PyTorch.
The [config file](https://github.com/bigscience-workshop/bigscience/blob/b4a4f4651771cb78297abe5074aaf2de1f92d6ce/train/tr11-176B-ml/setup-test-n2.slurm) lists the sample count of the dataset as 220M and a global batch size of 2048, which equates to ~107K steps per epoch. The [main README](https://huggingface.co/bigscience/bloom/blob/main/README.md) says...
Could you make a release when the guide to use the pretrained models on our own papers is available?
Is there a TensorFlow/Keras implementation of Adan? If no official version, do you know of any third-party implementation? Or alternatively, how many lines would you expect an implementation to have?...
Hi Yimian, I came here from your WACV 2021 presentation. This work looks pretty impressive. As we discussed during your presentation, could you share the inference time data for different...
Add option to use the [CUHK online dictionary](https://humanum.arts.cuhk.edu.hk/Lexis/lexi-can/) instead of the local dictionary. Queries are slower but more up-to-date.
Hi Ziming, In Section 6 of your paper, you mentioned that KANs are practically 10X slower than MLPs. I am curious what you meant by it. Did you mean a...
In addition to TokenMix before the first Transformer block, have you considered or tried TokenMix in the middle of the model?