Preetam Chhimpa comments

Results 9 comments of


                                            Preetam Chhimpa

Fix BLT training_ci overfit test

Quick update: - `test_training_overfit` for BLT now passes locally with the overridden thresholds in `BltModelTest` (loss ~95% reduction, grad norm ~81% reduction), and generation overfits the fixed pattern. - CI...

Fix BLT training_ci overfit test

> I found it weird that the generation is not working with use_cache=True. I think it is worth investigating why (cc: @itazap if you have time to guide @preetam1407 )...

Fix BLT training_ci overfit test

> checking monday, it is weird to me that `make fixup` doesnt work as expected. You shouldn't have to add those placeholders to begin with @ArthurZucker This is fixed now....

Fix BLT training_ci overfit test

@3outeille, will be waiting for your review! I think we have resolved all the issues mentioned last week.

Fix BLT training_ci overfit test

> alright, just last issue to address and it will be good to merge. Good job overall ! 🚀 Hey @3outeille, could you please point me to the last remaining...

Fix BLT training_ci overfit test

@3outeille, updated `tests/test_training_mixin.py` to set `use_cache=True` only for `model_type == "recurrent_gemma"`.

Fix BLT training_ci overfit test

A few CI checks are still failing. The `CI tests_tokenization` failures look infra-related, similar to some earlier CI failures in this PR. I ran the failing tests locally, and they...

Fix BLT training_ci overfit test

@3outeille, all requested changes are done. Whenever you get time, I’d appreciate you taking a look to merge it. Thanks a lot!

Flux 2: The shape of the latent argument is undocumented

Hi! I’d like to work on this