emergent_in_context_learning
emergent_in_context_learning copied to clipboard
Have you experiment with smaller size model
Hi, in the paper, a transformer model with 12 layers and embdding size 64 is used to validate your hypothesis. Did you do any trial experiment on smaller sized model ? and if you did, what's the result ?