Chien-Chin Huang

Results 119 comments of Chien-Chin Huang

We do see some requests to apply CP to only some region. This will require some communication before SDPA as we will have to borrow some ranks from the DP...

@wwwjn Do you think your implementation is generalized enough to put it to the core train.py?

@wwwjn Can this be related to the rng state not being saved?

@wwwjn can you take a look and confirm Flux checkpoint is actually working? Thanks!

@wwwjn Have you encountered this issue when running on the devgpu? Landing the PR looks not harmful but wants to understand why this is required specific to Flux encoder. Is...

For `In-training validation`, it looks like solution 1 is straightforward and common solution though some designs are needed to make it fit into TorchTitan's trainer architecture. As for `After-training evaluation`,...

If it is just one to one mapping, then that's not too bad. But as you can see in the file, there are more than just name conversion. The model...

@pradeepfn Please take a look at this PR.

When you save PyTorch format, is that full tensor (non-DTensor) and is saved with torch.save()?