Nicolas Patry

Results 978 comments of Nicolas Patry

Oh no that cannot change. But the idea, is that you can call it for a very long range (like `max_new_tokens=100`) which will use the past_key_values over and over without...

Yes, it's intended goal is to decide when to stop generating tokens (hence the return type, false means continue generating, true means stop, iteration will stop when ANY criteria wants...

> into a simple function that you call in the preprocess? Sure, I'm not sure I understand how that cleans up the audio trimming, but we can definitely abstract away.

> could be awesome to have model.detect_language instead of all the mess above and dependencies on whisper! If you have some good ideas, please suggest them instead of waving them...

I'm not well versed with `Git` as a model. Pipelines are usually agnostic to actual models. As long as model X is `AutoModelForVision2Seq` it should work out of the box....

Yes that's exactly it. In the absence of tags the hub will check the config and assign a pipeline based on architecture format `ForXX`, just like the pipeline does.

Do you have a sample script to make it work for captionning ?

Seems to me that the colab does pretty much what the pipeline does: https://github.com/huggingface/transformers/blob/main/src/transformers/pipelines/image_to_text.py#L114 Any reason not to implement `ForVision2Seq` ?

> It is a custom model but has the same API as the AutoModelForVision2Seq class So make it `ForVision2Seq`, no ? As long as it upholds the invariant (signature +...