transformers
transformers copied to clipboard
num_noise_spans should be <= num_items #22246
What does this PR do?
Fixes #22246
When mean_noise_span_length
is set to 1 there are cases (for example noise_density=.55
when the num_noise_spans
becomes greater than num_nonnoise_tokens
So the correction seems to be to consider also the num_nonnoise_tokens
in calculation of num_noise_spans
num_noise_spans = int(np.round(min(num_noise_tokens,num_nonnoise_tokens) / self.mean_noise_span_length))
Demonstration of the buggy behaviour https://gist.github.com/alexcpn/b9bb2b0f01833d1bb862502faf99bab8#file-t5_denoising-py
Demonstration of the possible correction
https://gist.github.com/alexcpn/b9bb2b0f01833d1bb862502faf99bab8#file-t5_denoising_corrected-py
Who can review?
@sanchit-gandhi
The documentation is not available anymore as the PR was closed or merged.
Thanks- Done - Did make style
and pushed the changes to this branch
Thanks @alexcpn - unfortunately the CI is still unhappy about the code style! Could you try rebasing onto main, run style fix, and then force pushing?
git rebase upstream/main
make style
git push -f issue-22246
I have rebased and checked and force-pushed. It is actually fine locally.
git branch
* issue-22246
main
$ make style
black examples tests src utils setup.py
All done! ✨ 🍰 ✨
2380 files left unchanged.
ruff examples tests src utils setup.py --fix
make: ruff: No such file or directory
make: *** [Makefile:69: style] Error 127
Understood the problem - I was having black 22.x and CircleCI is using 23.x - https://github.com/huggingface/transformers/pull/21480
@sanchit-gandhi CI is green
Awesome, nice find @alexcpn 🙌 Let's get a final review and get the PR merged!