Colm Evans
Colm Evans
Forgive my ignorance but I'm not really sure what this issue means. In conv2d the line: `ret = (x * weight.reshape(1, groups, rcout, 1, 1, cin, H, W)).sum((-3, -2, -1)).reshape(bs,...
Please do, I still don't get what is meant by "einops".
Thank you for clarifying.
That's basically what I was thinking of except specific to the forward module and not only for fine-tuning
Also I did a little profiling on a CPU on a smaller model with batch size 4, 1024 tokens and 8 experts (with 3 used per token). Initializing the model...
Happy new year! You're right that it would be slower than the currently used method for sparse mixture of experts but I don't know if it would be that much...
So it turns out I'm really f**ing stupid and forgot to add a line to actually run the model on some input when I was profiling. It is not actually...
How many of these re-published tweets are slightly changed and how many are exactly the same (i.e. copy pasted)? If mostly the exact same, would storing a hash of the...