tika-python
tika-python copied to clipboard
make tika CLI similar to parser.from_file
Thanks a lot for tika-python. its fast and awesome! 🥇
I suggest the following change to make the command line tool
$ tika-python parse all file.pdf
behave more similarly to inline python function
tika.parser.from_file("file.pdf", service='all')
Currently, the command line tool produces content in XHTML by default, while the inline function produces plain text with an option to set argument xmlContent=True
(False by default), which was unexpected. It is unclear how to specify plaintext output for the command line tool otherwise.
Coverage remained the same at 47.645% when pulling e2e602a89012863cc074871f71a7b6c853d4e26a on vedal:master into d692c0ffa6b85d099019de9b94888fb4c2a48040 on chrismattmann:master.
this is an interesting suggestion thank you @vedal. In the PR, I would rather not just comment out the old code, I'd rather see a clean update. Also how do you think this will affect back compat with other users? I'm going to schedule this for tika-next (the post release milestone for discussion). Thanks.
Will close this for now. If you desire this as I mentioned, need a better patch and some doco explaining for users. Thanks for the idea though @vedal