speechbrain icon indicating copy to clipboard operation
speechbrain copied to clipboard

`sentencepiece` is no longer maintained and doesnt support newer python versions

Open fcakyon opened this issue 5 months ago • 5 comments

Describe the bug

Please check this thread to see the issues sentencepiece is having and maintainers are ignoring: https://github.com/google/sentencepiece/pull/1120

pyannote is having issues since it depends on speechbrain which depends on sentencepiece: https://github.com/pyannote/pyannote-audio/issues/1890

Even mistral-common and transformers packages are having issues and thinking of removing sentencepiece from dependencies: https://github.com/mistralai/mistral-common/issues/75

Expected behaviour

I expect speechbrain to work in python 3.13.

To Reproduce

No response

Environment Details

No response

Relevant Log Output


Additional Context

No response

fcakyon avatar Jul 12 '25 22:07 fcakyon

Hi @fcakyon, thanks for raising this point to our attention. I believe that we could try to introduce an alternative to Sentencepiece within SpeechBrain that supports Python 3.13+, but this would requires a bit of time on our side to think about what to do here as it is unlikely (at least on my side I don't think this is wise) to deprecate all the related recipes built on top of Sentencepiece.

I think, for now, we have two options in the meantime:

  1. lock the python version to be up to 3.12;
  2. remove Sentencepiece from the requirements.txt, hence making Sentencepiece optional, but some recipes won't be compatible with python 3.13;

Overall, I tend to think this is a good thing to try switching from Sentencepiece to something else. But we need to do it carefully.

Adel-Moumen avatar Jul 13 '25 09:07 Adel-Moumen

cc: @pplantinga @TParcollet

Adel-Moumen avatar Jul 13 '25 09:07 Adel-Moumen

@Adel-Moumen thank you for getting back to me so quickly. There is still time to think about how to handle sentencepiece dependent recipes for Python 3.13, considering Python 3.12 end-of-life is around 2028. But it's good to start thinking at this point.

fcakyon avatar Jul 13 '25 12:07 fcakyon

Its not as easy as I thought to move sentencepiece to extra requirements slash integrations because many of our inference models depend on it. One alternative is a community build of sentencepiece that supports 3.13:

https://pypi.org/project/dbowring-sentencepiece/

pplantinga avatar Jul 18 '25 19:07 pplantinga

Perhaps another alternative is to convert our sentencepiece models to tokenizers a la https://github.com/huggingface/transformers/issues/25316

pplantinga avatar Jul 22 '25 13:07 pplantinga