grok-1
grok-1 copied to clipboard
Add Exceptions to LanguageModel and question on DenseBlock Impl
Adding some exception handling to LanguageModel and added a comment around the implementation of DenseBlock. Usually the widening -> gelu -> projection is sequential but this implementation isn't, I'm curious whether this is an intentional detail?
I am kinda curious.. Does this degrade model performance?
how the turn tables