Leo-Lifeblood

Results 5 comments of Leo-Lifeblood

The error is that the batch dimension in this case 32 is being deleted. This breaks any training loop.

> Hi @Leo-Lifeblood thanks for creating the issue. I believe you are seeing this error because you're using the `RotaryPositionalEmbeddings` class with an input tensor shape that doesn't line up...

the rope implementation somehow ends up with 1/8th the required batch dimension: ope.cache.shape torch.Size([4096, 16, 2]) add Codeadd Markdown 10 rope_cache = rope.cache[:10] add Codeadd Markdown torch.rand(32, 10, 4 ,8).reshape(*torch.rand(32,...

Ok I have tried what you have suggested It has not worked though I have the code below and i'll try to explain whats wrong with it from my perspective:...

I mean sure in most cases yes it is a bad idea, however, that goes for most things in life for instance It's generally not a good idea not to...