transformers icon indicating copy to clipboard operation
transformers copied to clipboard

num_noise_spans should be <= num_items #22246

Open alexcpn opened this issue 1 year ago • 5 comments

What does this PR do?

Fixes #22246

When mean_noise_span_length is set to 1 there are cases (for example noise_density=.55 when the num_noise_spans becomes greater than num_nonnoise_tokens

So the correction seems to be to consider also the num_nonnoise_tokens in calculation of num_noise_spans

num_noise_spans = int(np.round(min(num_noise_tokens,num_nonnoise_tokens) / self.mean_noise_span_length))

Demonstration of the buggy behaviour https://gist.github.com/alexcpn/b9bb2b0f01833d1bb862502faf99bab8#file-t5_denoising-py

Demonstration of the possible correction
https://gist.github.com/alexcpn/b9bb2b0f01833d1bb862502faf99bab8#file-t5_denoising_corrected-py

Who can review?

@sanchit-gandhi

alexcpn avatar Apr 22 '23 14:04 alexcpn

The documentation is not available anymore as the PR was closed or merged.

Thanks- Done - Did make style and pushed the changes to this branch

alexcpn avatar Apr 28 '23 04:04 alexcpn

Thanks @alexcpn - unfortunately the CI is still unhappy about the code style! Could you try rebasing onto main, run style fix, and then force pushing?

git rebase upstream/main
make style
git push -f issue-22246

sanchit-gandhi avatar Apr 28 '23 17:04 sanchit-gandhi

I have rebased and checked and force-pushed. It is actually fine locally.

git branch
* issue-22246
  main
$ make style
black examples tests src utils setup.py
All done! ✨ 🍰 ✨
2380 files left unchanged.
ruff examples tests src utils setup.py --fix
make: ruff: No such file or directory
make: *** [Makefile:69: style] Error 127

Understood the problem - I was having black 22.x and CircleCI is using 23.x - https://github.com/huggingface/transformers/pull/21480

alexcpn avatar Apr 29 '23 07:04 alexcpn

@sanchit-gandhi CI is green

alexcpn avatar Apr 30 '23 01:04 alexcpn

Awesome, nice find @alexcpn 🙌 Let's get a final review and get the PR merged!

sanchit-gandhi avatar May 02 '23 16:05 sanchit-gandhi