Gabriele Sarti

Results 46 comments of Gabriele Sarti

I can try to explain the issue better at our next meeting, but for now you can simply proceed assuming that greedy decoding is used!

Hi @LuukSuurmeijer, The code still had some issues due to FP8 conversion not handling `nans` by default (target-side attribution matrices contain nan for future tokens). Since torch now supports [two...

Hi @yuzhaouoe, thanks for opening this PR! Could you provide a minimal code snippet to reproduce the issue with bfloat16 you mention?

Hi @lsickert, could you confirm that you face this bug when installing `main`? There was a bug where batching was not applied to `model.generate` causing this same issue that was...

Documenting here another issue reported by @g8a9: when attributing a large set of examples at once it could happen that an example that is too large to fit the GPU...

Yes, the idea of sorting by length was precisely aimed at avoiding the recomputation for the first big batch. As long as results are returned in the original order there...

Moving to CPU seems the best option among the two, but I'm still not sure whether this should be preferable to raising an error at the start to signal that...

Hi @frankdarkluo, sadly not yet! But there is a WIP PR for it here: #217

Thanks for the report @DanielSc4! We'll evaluate how complex it would be to support out-of-the-box `PeftModel` classes in Inseq. In the meantime, a viable workaround is to use `model.merge_and_unload()` to...