Nicolas Patry
Nicolas Patry
> Should it instead be that we wait until each beam reaches the stop_sequence or any other stopping criteria before stopping the generation process? @KMFODA I think `eos_token_id` is already...
For the tests removing the breakpoint should help then for code quality. ``` pip install -e .[quality] make fixup ``` Should do the trick.
> If we were to move stop_sequence to be in generate wouldn't we have to tokenise it first. In that case what's the reasoning behind feeding it as a stop_sequence...
Ok, for this part I will let @NielsRogge comment as I am not the best person to answer how it should be done.
Ok, marking as draft while the other PR is being worked on.
@sgugger If you want to take a look a the tests. Right now the tests are failing since the cache was written for the previous HEAD code. We can do...
> The caching part should go more in the huggingface_hub IMO, especially now that we rely on it for everything. But I also think people might have strong opinion on...
@ankrgyl This PR caches files for a very small amount of time (10s) because most of the time users will want the new models when they exist. You can try...
Btw @sgugger is working on a better fix we should reduce the amount of network calls as close to 1 as possible.
@davidmezzetti Didn't see this issue, but I didn't see the regression. Should have been fixed here https://github.com/huggingface/transformers/pull/17906. Sorry it had time to ship with `4.20`. It will be reverted back...