rnnmorph icon indicating copy to clipboard operation
rnnmorph copied to clipboard

Implementation in a loop clogs up memory

Open molokanov50 opened this issue 1 year ago • 1 comments

There is a need for me to determine grammatical case for terms in texts of a big dataset. I found that the increment of memory usage as large as 0.3 to 0.7 MB occurs virtually every call of forms = predictor.predict(terms). Consider a simple example:

def findCase(termNumber, text):  # нахождение падежа термина с указанным номером в тексте
    terms = text.split()
    forms = predictor.predict(terms)
    myTag = forms[termNumber].tag
    parts = re.split('\\|', myTag)
    for part in parts:
        subparts = re.split('=', part)
        if len(subparts) < 2:
            continue
        if subparts[0] == 'Case':
            return subparts[1].upper()
    return 'UNDEF'

And then, if I have a collection of texts, i can implement:

myDict = {}
for i in range(len(texts)):
    case = findCase(0, texts[i])
    myDict[i] = case

I have 12500 texts with average length of about 700 symbols each. Running all my dataset required me extra 1.5 GB of memory due to utilizing predictor.predict(terms). Seems like my local variable forms remains in the memory after completing the method, but really, is your RNNMorphPredictor model maybe self-trained in this scenario? How to free this volume of memory?

molokanov50 avatar Feb 09 '23 08:02 molokanov50