tika-python icon indicating copy to clipboard operation
tika-python copied to clipboard

Tika-Python is a Python binding to the Apache Tika™ REST services allowing Tika to be called natively in the Python community.

Results 48 tika-python issues
Sort by recently updated
recently updated
newest added

With the advent of [TIKA-3329](https://github.com/apache/tika/pull/419/files), we can now have a full translation engine in Tika-Python that supports over 300+ languages to English. Standardize on this. It requires Tika 2.0 though,...

enhancement
py3

I wanted to use the library with a file that I get from another server, thus I already had the file in memory. It took me a while to understand...

Hi @chrismattmann , Fantastic library! I was wondering if you have near plans/roadmap to make it compatible with Apache Tika version 2.1.0 I used the `tika-server-standard-2.1.0.jar` file from `https://tika.apache.org/download.html` to...

Even after Tika server is started, the while body will keep being executed until max retries is reached. It should break out of the loop upon successful startup.

Hi, Tika works fine until I restart the machine I need to reinstall it or I will get this error message: ``` 2022-01-16 20:09:48,737 [MainThread ] [INFO ] Retrieving http://search.maven.org/remotecontent?filepath=org/apache/tika/tika-server/1.24/tika-server-1.24.jar...

This pull request enables headers specification in `unpack.from_file`.

I'm using Apache Tika to OCR a bunch of PDFs. When I use the GUI (by doing java -jar tika-app-1.22.jar) everything works fine: I go to "Recursive JSON" on the...

I found that when parsing compressed files, the content of each file in the subdirectory is mixed in the content field. eg. test.zip => test/a.txt test/b.txt, after `parsed = parser.from_file('test.zip')...

Fixes of #167, #124, #225 and #285 only mask the error, but never generate the correct Content-Disposition header. With those fixes: when rfc6266 is installed, we get TypeError as reported...

bug
enhancement
help wanted

Checkboxes from Word documents convert to the text "FORMCHECKBOX" and lose any info about whether or not they are checked. Is it possible to render those differently and ideally maintain...