tika-python
tika-python copied to clipboard
How to structure compressed files, such as rar, zip format?
I found that when parsing compressed files, the content of each file in the subdirectory is mixed in the content field.
eg. test.zip => test/a.txt test/b.txt, after
parsed = parser.from_file('test.zip') parsed['content']=..... parsed['metadata']=......
How can I structure the file ‘content‘ and ‘metadata‘ in the subdirectory according to the file name of the subdirectory?