python-gatenlp icon indicating copy to clipboard operation
python-gatenlp copied to clipboard

Reuse StringGazetteer Object

Open luisenriqueramos1977 opened this issue 2 years ago • 5 comments

Is your feature request related to a problem? Please describe.

I currently can use a text file with a list of terms to create a StringGazetteer, which I can use without any problem. However, as this is part of a repetitive process, I would like to have the possibility of storing the StringGazetteer object to reuse it, and I wonder if such a feature is available?, or if there is any other approach to reach this goal?.

Luis Ramos

luisenriqueramos1977 avatar Apr 16 '23 13:04 luisenriqueramos1977

Thank you for this feature request!

I think it would be a good idea to be able to store and load a gazetteer instance, however currently this is not immediately possible and the code would need some work to make it possible to e.g. pickle the instance.

The main issue I see at the moment is that in order to be able to pickle the object, any lambdas must be replaced with callables that can be pickled, or by plugging into the pickle process by implementing our own __getstate__ and __setstate__ methods.

johann-petrak avatar Apr 17 '23 09:04 johann-petrak

Actually, I think you could try doing this with the cloudpickle package: Install the package into your environment then do something like this to save and restore the gazetteer instance:

Save:

import cloudpickle
with open("gaz1.pkl", "wb") as outf:
    cloudpickle.dump(gaz1, outf)

Load:

with open("gaz1.pkl", "rb") as inf:
    gaz1 = cloudpickle.load(inf)

Does that work for you?

johann-petrak avatar Apr 17 '23 10:04 johann-petrak

Alternately, you could try using the dill package in a similar way.

johann-petrak avatar Apr 17 '23 10:04 johann-petrak

Dear Petrak,

Thanks for the information, I was able to test it and get similar results for most gazetteers, but in one of them I got the following error:

Cannot add gazetteer entry '', matches root node

It seems there is a blank space in the list, but I used the same procedure for all of them, so I do not have idea how to solve this problem.

Best regards

Luis Ramos

El lun, 17 abr 2023 a las 12:15, Johann Petrak @.***>) escribió:

Alternately, you could try using the dill https://github.com/uqfoundation/dill package in a similar way.

— Reply to this email directly, view it on GitHub https://github.com/GateNLP/python-gatenlp/issues/188#issuecomment-1511074049, or unsubscribe https://github.com/notifications/unsubscribe-auth/ALS7GAAICVB276CYNHGSY2LXBUJ3FANCNFSM6AAAAAAXAEMJNA . You are receiving this because you authored the thread.Message ID: @.***>

luisenriqueramos1977 avatar May 30 '23 18:05 luisenriqueramos1977

It seems there is a blank space in the list,

What do you mean by that? Would you be able to cut the list down to the shortest list that produces that error and share it with me either heere, or by private email?

johann-petrak avatar May 30 '23 20:05 johann-petrak