Wizyoung
Wizyoung
https://gist.github.com/wizyoung/5330ad501e73a97dfe2f0088decdb1ca I have implemented a version of torch.compile chunked_lce that supports soft caps and passes all numerical accuracy tests in benchmark_fused_linear_cross_entropy.py modified from this repo. My main concern is the...
> @wizyoung I agree there's some additional memory overhead (in particular, I think we don't inplace the addmm), but the additional memory is generally pretty negligible here, no? > >...
@Chillee I have updated my scripts here: https://gist.github.com/wizyoung/5330ad501e73a97dfe2f0088decdb1ca