`sentencepiece` is no longer maintained and doesnt support newer python versions
Describe the bug
Please check this thread to see the issues sentencepiece is having and maintainers are ignoring: https://github.com/google/sentencepiece/pull/1120
pyannote is having issues since it depends on speechbrain which depends on sentencepiece: https://github.com/pyannote/pyannote-audio/issues/1890
Even mistral-common and transformers packages are having issues and thinking of removing sentencepiece from dependencies: https://github.com/mistralai/mistral-common/issues/75
Expected behaviour
I expect speechbrain to work in python 3.13.
To Reproduce
No response
Environment Details
No response
Relevant Log Output
Additional Context
No response
Hi @fcakyon, thanks for raising this point to our attention. I believe that we could try to introduce an alternative to Sentencepiece within SpeechBrain that supports Python 3.13+, but this would requires a bit of time on our side to think about what to do here as it is unlikely (at least on my side I don't think this is wise) to deprecate all the related recipes built on top of Sentencepiece.
I think, for now, we have two options in the meantime:
- lock the python version to be up to
3.12; - remove
Sentencepiecefrom therequirements.txt, hence makingSentencepieceoptional, but some recipes won't be compatible with python3.13;
Overall, I tend to think this is a good thing to try switching from Sentencepiece to something else. But we need to do it carefully.
cc: @pplantinga @TParcollet
@Adel-Moumen thank you for getting back to me so quickly. There is still time to think about how to handle sentencepiece dependent recipes for Python 3.13, considering Python 3.12 end-of-life is around 2028. But it's good to start thinking at this point.
Its not as easy as I thought to move sentencepiece to extra requirements slash integrations because many of our inference models depend on it. One alternative is a community build of sentencepiece that supports 3.13:
https://pypi.org/project/dbowring-sentencepiece/
Perhaps another alternative is to convert our sentencepiece models to tokenizers a la https://github.com/huggingface/transformers/issues/25316