gpt-neox
gpt-neox copied to clipboard
Please investigate Retrieval-Enhanced Transformers ( RETRO)
Is your feature request related to a problem? Please describe.
Training very large networks takes a lot of time, requires a lot of resources that are unavailable to many small organizations. It's a well known problem.
Describe the solution you'd like
Please investigate Retrieval-Enhanced Transformer because with requires 25× fewer parameters than contemporary large models, but performs roughly equivalent to GPT-3 while actually being maintainable.
A working open source implementation would be a complete game changer.
*Describe alternatives you've considered
FPGA accelerated sparse networks. However, FPGA's are notoriously hard to program and thus not very practical.
https://numenta.com/blog/2021/05/27/fast-and-accurate-sparse-networks
Additional context
Retrieval-Enhanced Transformer (RETRO) https://arxiv.org/abs/2112.04426
"...a 7.5 billion parameter RETRO model outperforms the 175 billion parameter Jurassic-1 on 10 out of 16 datasets and outperforms the 280B Gopher on 9 out of 16 datasets."
https://deepmind.com/research/publications/2021/improving-language-models-by-retrieving-from-trillions-of-tokens
I was just looking in this direction to bring back the size of the model towards a consumer-level size.
Seems deepspeed's team has done some work on that front although is for a similar model RETRO instead of REALM: https://github.com/microsoft/DeepSpeedExamples/tree/174ae3bc8dbb688cfaccb4afa15d6e2cdbe19ce5/Megatron-LM-v1.1.5-ZeRO3#realm-pipeline
The big resource demand is to extract the embeddings out of the PILE / Wikipedia and then do the re-training using those.
There is already a PyTorch implementation from @lucidrains (no trained weights though): https://github.com/lucidrains/RETRO-pytorch
If someone would like to look at integrating RETRO with this library, I will happily train a model using it.
I'd love to give it a shot next week, unless there's something I'm forgetting about that's a higher priority @StellaAthena
I'd love to give it a shot next week, unless there's something I'm forgetting about that's a higher priority @StellaAthena
Go ahead!
Retro is now officially supported in megatron! https://github.com/NVIDIA/Megatron-LM#retro
I'm new to the neox/megatron codebase, but if someone is willing to discuss some advice/suggestions on how to get started with this process, I'd love to give it a shot!