LlamaGen icon indicating copy to clipboard operation
LlamaGen copied to clipboard

Issues about the 3B model

Open Con6924 opened this issue 8 months ago • 1 comments

Thanks for your fascinating work!

I'm now trying on the 3B model and encountered two issues:

  1. The json of 3B model is missing. I tried to modify from the json of the XXL version to match the checkpoint and statistics in the paper, but meet another issue;
  2. ValueError: Head size 100 is not supported by PagedAttention. Supported head sizes are: [64, 80, 96, 112, 128, 256]. from xformers.

Con6924 avatar Jun 11 '24 11:06 Con6924