billwu
Results
2
issues of
billwu
I just try to do some small changes on model '2b' 1, Limit max_position_embeddings from 8096 to 256. :) 2, Trim kv-cache in GemmaAttention to max_position_embeddings(256). 3, Unlimit the output...
type:support
Maybe there are some benifit below: 1, The code could be simplier. 2, The inference could be faster. 3, The inference can accept multi-tokens in this way. There are some...