Raushan Turganbay

Results 117 comments of Raushan Turganbay

I ran the datasets for each of the tasks. From the hhh alignment dataset, I took only the "other" part, and for the mt bench the first 100 questions. The...

Yes, I opened PR [131](https://github.com/argilla-io/distilabel/pull/131)

@hxhcreate @NielsRogge yes, that is a known issue and I merged a fix few days ago. Unfortunately refactoring broken some things, let me know if updating to the latest `main`...

Hey! This should be solvable by popping the `cache_position` from `inputs` in [this method](https://github.com/haotian-liu/LLaVA/blob/c121f0432da27facab705978f83c4ada465e46fd/llava/model/language_model/llava_llama.py#L144). `inputs.pop("cache_position")` The error is raised because calling "super()" returns kwargs that are not used in the...

@ArthurZucker yes, making a versatile cache class will go on another PR. In that case we can leave `quanto` as the only choice available, and the rest can be implemented...

@ArthurZucker @gante I made a few changes from the last review: 1. Now we support HQQ and quanto (quanto by default as it is a bit faster, we'll work on...

Cool, merging 🤞🏻 Ran slow tests in quantization and generation locally, everything is passing.

@ydshieh This PR actually results in slow-down because of quantization 😅 But we can check the memory usage probably. Here is a [script](https://gist.github.com/zucchini-nlp/56ce57276d7b1ee666e957912d8d36ca) I used, but you'd have to replace...

@Cyrilvallez Right, QuantizedCache stores most of the past kv in a private list, so I think these methods would not work even before your changes. Thanks for noticing! I will...

@hegderavin sure, we will be porting models one by one (#28981). Right now I am waiting for this PR to be merged, so that we can work on other models...