tika-python icon indicating copy to clipboard operation
tika-python copied to clipboard

How to structure compressed files, such as rar, zip format?

Open NLPOR opened this issue 2 years ago • 0 comments

I found that when parsing compressed files, the content of each file in the subdirectory is mixed in the content field. eg. test.zip => test/a.txt test/b.txt, after parsed = parser.from_file('test.zip') parsed['content']=..... parsed['metadata']=...... How can I structure the file ‘content‘ and ‘metadata‘ in the subdirectory according to the file name of the subdirectory?

NLPOR avatar Dec 03 '21 02:12 NLPOR