tika-python icon indicating copy to clipboard operation
tika-python copied to clipboard

How to handles cases where if I iterate over 100k files at once it fails after parsing a large number?

Open user06039 opened this issue 3 years ago • 0 comments

I'm using apache tika python client to parse pdf files but in my case I have more than a million documents. I think tika has some limitation where after parsing some 100k files then it starts to fail to parse new pdfs when we do,

from tika import parser
parsed = parser.from_file('/path/to/file')

Is this a common issue? How can I handle it? Is it possible to restart tika directly from my python code and make it work? Please help me

user06039 avatar Jun 07 '21 23:06 user06039