Cyril Vallez
Cyril Vallez
@ArthurZucker Should I try to rebase once again to see if it solves the issue?
Hi @zucchini-nlp! When rebasing I noticed that in your recent https://github.com/huggingface/transformers/pull/30483 you made QuantizedCache a subclass of DynamicCache. However, some code paths in `generate()` rely on checking `isinstance(cache, DynamicCache)` (this...
@ArthurZucker Rebasing is done and all CIs are green!
Yes, it will fail in the current state of the library, but adding support will be straight-forward and won't require messing with 'generation.utils' after this is merged!
Hi @ArthurZucker, any news? 😁
@ArthurZucker The modeling update was mostly used in the case `inputs_embeds` were passed, but checking `past_length == 0` is equivalent and cleaner so I updated. I also removed the `TODO:...
I just added the change to more models and rebased to avoid conflicts with new commits in main! For Cohere-based models, I most notably computed a **memory gain ratio of...
Will do! However, when playing with `torch.compile`, I noticed that adding a `logger.warning_once()` in the `forward` breaks the graph with the following error: `Unsupported: call_method UserDefinedObjectVariable(Logger) warning_once [ConstantVariable()] {}`. This...
DO NOT MERGE YET Everything else is good, but still need to sort out the logger.warning_once/compile issue
@ArthurZucker @gante everything is now ready. From my tests, it seems like `compile` does not support any print-like functionality at the moment, either from `print`, `logger` or `warnings`. I first...