tika-python icon indicating copy to clipboard operation
tika-python copied to clipboard

make tika CLI similar to parser.from_file

Open vedal opened this issue 3 years ago • 1 comments

Thanks a lot for tika-python. its fast and awesome! 🥇

I suggest the following change to make the command line tool $ tika-python parse all file.pdf behave more similarly to inline python function tika.parser.from_file("file.pdf", service='all')

Currently, the command line tool produces content in XHTML by default, while the inline function produces plain text with an option to set argument xmlContent=True (False by default), which was unexpected. It is unclear how to specify plaintext output for the command line tool otherwise.

vedal avatar Dec 01 '20 07:12 vedal

Coverage Status

Coverage remained the same at 47.645% when pulling e2e602a89012863cc074871f71a7b6c853d4e26a on vedal:master into d692c0ffa6b85d099019de9b94888fb4c2a48040 on chrismattmann:master.

coveralls avatar Dec 01 '20 07:12 coveralls

this is an interesting suggestion thank you @vedal. In the PR, I would rather not just comment out the old code, I'd rather see a clean update. Also how do you think this will affect back compat with other users? I'm going to schedule this for tika-next (the post release milestone for discussion). Thanks.

chrismattmann avatar Dec 31 '22 20:12 chrismattmann

Will close this for now. If you desire this as I mentioned, need a better patch and some doco explaining for users. Thanks for the idea though @vedal

chrismattmann avatar Jan 01 '23 22:01 chrismattmann