Preetam Chhimpa
Preetam Chhimpa
Quick update: - `test_training_overfit` for BLT now passes locally with the overridden thresholds in `BltModelTest` (loss ~95% reduction, grad norm ~81% reduction), and generation overfits the fixed pattern. - CI...
> I found it weird that the generation is not working with use_cache=True. I think it is worth investigating why (cc: @itazap if you have time to guide @preetam1407 )...
> checking monday, it is weird to me that `make fixup` doesnt work as expected. You shouldn't have to add those placeholders to begin with @ArthurZucker This is fixed now....
@3outeille, will be waiting for your review! I think we have resolved all the issues mentioned last week.
> alright, just last issue to address and it will be good to merge. Good job overall ! 🚀 Hey @3outeille, could you please point me to the last remaining...
@3outeille, updated `tests/test_training_mixin.py` to set `use_cache=True` only for `model_type == "recurrent_gemma"`.
A few CI checks are still failing. The `CI tests_tokenization` failures look infra-related, similar to some earlier CI failures in this PR. I ran the failing tests locally, and they...
@3outeille, all requested changes are done. Whenever you get time, I’d appreciate you taking a look to merge it. Thanks a lot!
Hi! I’d like to work on this