Is it possible to change the spaCy pipeline?
Hi @d99kris, I really like your implementation. I've implemented a project with it and was wondering if it is possible to influence the pipeline of spaCy within this wrapper?
Best, Benny
Hi @BennyDietrich - thanks!
Using spaCy pipeline is currently not supported by spacy-cpp.
Let me take a look and see if it can be supported by spacy-cpp. If it's a small effort I might take a stab at it, but I'll have to check some and get back.
Hi @d99kris,
thanks for considering adding this for me! Since we are already in contact I thought I could ask you another thing about spacy-cpp. I try to use spacy-cpp with multi-threading. It runs into segmentation faults the moment when two threads try to access tokens from two different doc objects at the same time.
I've tried different things to prevent this behavior. First I started, creating different NLP objects from which I created the different docs. Since this didn't help, I tried to create different spacy objects and generated different NLP- and Doc-objects still without success.
I'm not sure where the problem comes from. Could it be that I'm currently not taking care of the GIL? I'm sure you have a lot to do so I don't wanna hold you up, just thought you maybe have encountered a similar problem your self.
Hi @BennyDietrich - regarding pipeline support it seems it would be non-trivial to add, so I'll have to consider it out of scope for spacy-cpp due to somewhat limited resources, and having too many open source projects. 🙂
As for multi-threaded usage, yes I think your suspicion is right. Generally python and threading requires many considerations. Having that said, I would expect creating/sharing one Spacy instance across threads, and acquiring a mutex whenever calling into any Spacy API's should be safe, but there will also not be much parallelism. Personally I have not tried doing any threading with python.
I will proceed to close this issue.