unsloth
unsloth copied to clipboard
Vocab size of 102400 exceeds the max CUDA blocksize of 65536 using A40 GPU
@danielhanchen @shimmyshimmer Hi, do you have plan to develop some efficient kernels for this issue? Unsloth: Vocab size of 102400 exceeds the max CUDA blocksize of 65536. For now, Unsloth will use Pytorch's CrossEntropyLoss, which will entail a 25% increase in memory usage and be slower.