unsloth icon indicating copy to clipboard operation
unsloth copied to clipboard

Vocab size of 102400 exceeds the max CUDA blocksize of 65536 using A40 GPU

Open songkq opened this issue 5 months ago • 3 comments

@danielhanchen @shimmyshimmer Hi, do you have plan to develop some efficient kernels for this issue? Unsloth: Vocab size of 102400 exceeds the max CUDA blocksize of 65536. For now, Unsloth will use Pytorch's CrossEntropyLoss, which will entail a 25% increase in memory usage and be slower.

songkq avatar Jan 15 '24 02:01 songkq