DiDev

Results 3 issues of DiDev

Hi, Assert fails in tensor/mod.rs when number of heads = number of layers. Not sure if it's not a good number but in nanogpt, equal values are supported - see...

hi @keyvank , let's say I want to add a new decoder layer (the one that gets constructed as part of 0..num_layers loop) at run time after the gpt::new() call,...

enhancement

Hi, Thank you for this wonderful library. I'm wondering if there is a way to use the recent rope paper's scaling workaround with x_transformers. I have seen your recent change...