Yuncong Chen

Results 3 comments of Yuncong Chen

For gemma-3-27b-it, I could not get the logits of this implementation to match the HF version, so I begin to track down where they deviate. This leads me to a...

context parallelism is one of the major features missing in torchtune.

I'm still getting an error related to this. Please see https://github.com/pytorch/pytorch/issues/155463#issuecomment-3060504993. > I still got this error `AssertionError: found no DeviceMesh from dtensor args for c10d.broadcast_.default!` if I use `tp_plan='auto'`...