Fix wrong special tokens being used for llmlingua-2

Open cornzz opened this issue 1 year ago • 1 comments

What does this PR do?

Fixes #181

The model_name parameter was not set when initializing TokenClfDataset which lead to the special tokens (bos/eos/pad) of the llmlingua-2-small model being used always, even when using the xlm-roberta-large based compression model.

Before submitting

[ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
[x] Was this discussed/approved via a Github issue? Please add a link to it if that's the case.
[x] Did you make sure to update the documentation with your changes?
[x] Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag members/contributors who may be interested in your PR.

Sep 13 '24 16:09 cornzz

@iofu728, @XufangLuo

I had to adjust the llmlingua2 tests slightly as the generated compressed text now differs a bit from before.

Sep 13 '24 16:09 cornzz