contextualSpellCheck icon indicating copy to clipboard operation
contextualSpellCheck copied to clipboard

French (doc add)

Open EtienneAb3d opened this issue 3 years ago • 4 comments

As requested in #41, here is how I succeeded in running contextualSpellCheck for French.

Use French spaCy model:

nlp = spacy.load("fr_core_news_sm")

Use camembert/camembert-base-ccnet:

nlp.add_pipe("contextual spellchecker", config={"max_edit_dist": 4,"model_name": "camembert/camembert-base-ccnet"})

Need these dependencies:

pip install sentencepiece
pip install protobuf==3.20

Remark: on the result spaces are lost, thus need a post-processing to get them back properly.

PS: for flaubert/flaubert_large_cased model, need this dependency

pip install sacremoses

EtienneAb3d avatar Aug 24 '22 08:08 EtienneAb3d

Hey, @EtienneAb3d thank you for raising this request. It is excellent to know you were successfully able to use it for french!

Would you like to raise a PR to add an example for the french language similar to other examples? I would be happy to merge the PR as it would be a great addition for people using it for french!

If you have any suggestions or other feedback, feel free to highlight them.

R1j1t avatar Aug 25 '22 16:08 R1j1t

Hi @R1j1t, perhaps later I will find the time to build such a PR. But, on the team side, if you have a direct access to edit, it's only few lines to add to the doc. ;-)

EtienneAb3d avatar Aug 26 '22 04:08 EtienneAb3d

No worries!

R1j1t avatar Aug 26 '22 11:08 R1j1t

Also note that in addition to @EtienneAb3d steps, in a Jupyer Notebook: restart kernel after protobuf install

!pip uninstall -y protobuf
!pip install protobuf==3.20

Also @EtienneAb3d , how did you manage the lost spaces issue?

mtx-z avatar Apr 30 '24 13:04 mtx-z