Can I use a user dictionary?
I want to use userdictinary. How to use?
BertSudachipyTokenizer takes argument sudachipy_kwargs, that is used to initialize the sudachi tokenizer.
https://github.com/WorksApplications/SudachiTra/blob/3f4a6c3a976a2b047a7714192928e7ac229fa699/sudachitra/tokenization_bert_sudachipy.py#L173
https://github.com/WorksApplications/SudachiTra/blob/3f4a6c3a976a2b047a7714192928e7ac229fa699/sudachitra/sudachipy_word_tokenizer.py#L47C1-L71C12
Prepare config file (see user dictionary section) and provide it via config_path like sudachipy_kwargs={"config_path": "path/to/your/config"}.
Note that the final output of sudachiTra tokenizer depends on its vocabulary and user-defined words may be split based on that.
With the latest version it is possible to pass sudachipy.config.Config object, passing it (or its json representation) as a config_path parameter. This change was made specially for using Sudachi inside tokenizers while keeping backward compatibility.