neural-api
neural-api copied to clipboard
Code GPT-3 Small from the paper Language Models are Few-Shot Learners
Code architecture similar to GPT-3 Small from the paper Language Models are Few-Shot Learners.
As per table 2.1 from the paper, GPT-3 Small is composed by
- 12 transformer decoders,
- 768 hidden dimensions and
- 3072 intermediate dimensions.