Leo-Lifeblood
Leo-Lifeblood
The error is that the batch dimension in this case 32 is being deleted. This breaks any training loop.
> Hi @Leo-Lifeblood thanks for creating the issue. I believe you are seeing this error because you're using the `RotaryPositionalEmbeddings` class with an input tensor shape that doesn't line up...
the rope implementation somehow ends up with 1/8th the required batch dimension: ope.cache.shape torch.Size([4096, 16, 2]) add Codeadd Markdown 10 rope_cache = rope.cache[:10] add Codeadd Markdown torch.rand(32, 10, 4 ,8).reshape(*torch.rand(32,...
Ok I have tried what you have suggested It has not worked though I have the code below and i'll try to explain whats wrong with it from my perspective:...
I mean sure in most cases yes it is a bad idea, however, that goes for most things in life for instance It's generally not a good idea not to...