gpt-neox Please investigate Retrieval-Enhanced Transformers ( RETRO)

Is your feature request related to a problem? Please describe.

Training very large networks takes a lot of time, requires a lot of resources that are unavailable to many small organizations. It's a well known problem.

Describe the solution you'd like

Please investigate Retrieval-Enhanced Transformer because with requires 25× fewer parameters than contemporary large models, but performs roughly equivalent to GPT-3 while actually being maintainable.

A working open source implementation would be a complete game changer.

*Describe alternatives you've considered

FPGA accelerated sparse networks. However, FPGA's are notoriously hard to program and thus not very practical.

https://numenta.com/blog/2021/05/27/fast-and-accurate-sparse-networks

Additional context

Retrieval-Enhanced Transformer (RETRO) https://arxiv.org/abs/2112.04426

"...a 7.5 billion parameter RETRO model outperforms the 175 billion parameter Jurassic-1 on 10 out of 16 datasets and outperforms the 280B Gopher on 9 out of 16 datasets."

https://deepmind.com/research/publications/2021/improving-language-models-by-retrieving-from-trillions-of-tokens

Feb 06 '22 15:02 marvin-hansen

I was just looking in this direction to bring back the size of the model towards a consumer-level size.

Seems deepspeed's team has done some work on that front although is for a similar model RETRO instead of REALM: https://github.com/microsoft/DeepSpeedExamples/tree/174ae3bc8dbb688cfaccb4afa15d6e2cdbe19ce5/Megatron-LM-v1.1.5-ZeRO3#realm-pipeline

The big resource demand is to extract the embeddings out of the PILE / Wikipedia and then do the re-training using those.

Feb 08 '22 00:02 Mistobaan

There is already a PyTorch implementation from @lucidrains (no trained weights though): https://github.com/lucidrains/RETRO-pytorch

Feb 10 '22 13:02 malteos

If someone would like to look at integrating RETRO with this library, I will happily train a model using it.

Feb 11 '22 12:02 StellaAthena

I'd love to give it a shot next week, unless there's something I'm forgetting about that's a higher priority @StellaAthena

Mar 03 '22 21:03 dashstander

I'd love to give it a shot next week, unless there's something I'm forgetting about that's a higher priority @StellaAthena

Go ahead!

Mar 03 '22 22:03 StellaAthena

Retro is now officially supported in megatron! https://github.com/NVIDIA/Megatron-LM#retro

I'm new to the neox/megatron codebase, but if someone is willing to discuss some advice/suggestions on how to get started with this process, I'd love to give it a shot!

Apr 28 '23 17:04 ncoop57

gpt-neox gpt-neox copied to clipboard

Please investigate Retrieval-Enhanced Transformers ( RETRO)

gpt-neox
gpt-neox copied to clipboard