tika-python icon indicating copy to clipboard operation
tika-python copied to clipboard

Tika-Python is a Python binding to the Apache Tika™ REST services allowing Tika to be called natively in the Python community.

Results 48 tika-python issues
Sort by recently updated
recently updated
newest added

[MainThread ] [WARNI] Tika server returned status: 405 Traceback (most recent call last) Every time I try to load python it return this same error. Even i try to assign...

**My PDF original text screenshot:** ![image](https://user-images.githubusercontent.com/56106630/180738889-76b47696-6b4e-4cb3-b456-d3738b8b253e.png) **Result of extraction:** ![image](https://user-images.githubusercontent.com/56106630/180738923-a1759c30-ca92-481c-9601-1d8f354ae828.png) Is there any setting to extract the exact line as `2nd of March 2015 onwards` rather than splitting it into...

I'm trying to extract text from a large pdf using this code(my file comes from a blob on azure and the pdf takes 7.3mb, it has got 140 pages and...

Hi, I get this error when parsing pdf using Tika ![error Tika Server](https://user-images.githubusercontent.com/23418370/167767008-3605f0f1-8b2e-498c-995f-39edf2a795de.png) To overcome this issue, I've tried: - setting os.environ['TIKA_SERVER_JAR'] to downloaded tika-server.jar - setting os.environ['TIKA_PATH'] to folder...

I am facing problem while extracting content from pdf, the returned content is None in case of pdf images. The same code seems to be working on my local setup...

Hi I tried to use tika-python in **aws lambda**, using docker container image, but when I tried it is throwing error. I tried installing java. It is not working. Can...

Why I'm getting this message when I try to use tika? ``` Traceback (most recent call last): File "/usr/share/pyshared/test.py", line 124, in raw = parser.from_file(str(path)) File "/home/fred/.local/lib/python3.8/site-packages/tika/parser.py", line 40, in...

Can someone assist? I am trying to get tika-python to return json with metadata and text when using the docker image of tika. I can get the results I want...

Hi, I am posting a file into my db using BE framework of Django. I would like to read the data from the file whilst parsing. However, I am getting...

Sorry for such a general issue. But I have been trying hard to extract Metadata (Author, Title, Abstract) from PDF using Tika-python client. But unfortunately, It is not able to...