Connor Henderson

Results 32 comments of Connor Henderson

When I had this error, limiting the max_new_tokens specified to the amount the model can generate per chunk fixed it for me (see the [generation_config.json](https://huggingface.co/openai/whisper-base/blob/main/generation_config.json)'s max_length). Looks like that might...

@JuheonChu I'm also interested in working on this, any interest in collaborating? Planning to get up to speed then open a fresh pr.

@younesbelkada Looks like it will be essentially the same fix across the other models too. Do you want me to pull that fix into a utility function once merged? Just...

Note: If you would like to get matching tokenizing before a fix goes in, installing `ftfy` first should do it. Initially looked to fix this specific issue around apostrophes, but...

Hey Arthur and xenova, in my case uninstalling ftfy or commenting out [these import lines](https://github.com/huggingface/transformers/blob/48327c57182fdade7f7797d1eaad2d166de5c55b/src/transformers/models/clip/tokenization_clip.py#L313-L317) leads to repro, I believe since [this conditional](https://github.com/huggingface/transformers/blob/48327c57182fdade7f7797d1eaad2d166de5c55b/src/transformers/models/clip/tokenization_clip.py#L469-L470) determines whether the BasicTokenizer is used for...

@sgugger yes thanks that is what I was saying. I think this comes down to the expected behavior when using the BasicTokenizer generally. If it is supposed to match the...

> Hey this PR looks really good (although I'll leave the actual review to Sanchit or Arthur). > > I was just wondering whether it also makes sense to support...

To-do list before re-requesting review - [x] **Converting the prompt token to an ID in an instance variable gives an incorrect ID, unlike when its called in decode** --Given we're...

Ok the additional requested features are now added so I believe this is ready for re-review. Thank you for your comments! > However note that this pipeline only processes the...

>1. Add the prompt_ids to model.generate() as in your earlier version of the PR. All this does is insert the prompt in the section. This doesn't give us the OpenAI...