Samkit Jain

Results 15 comments of Samkit Jain

Hi @Godlikemandyy Appreciate your interest in the library 1. The text in red boxes that you say is missing, can you please confirm if that text is copyable or not?...

1. Thanks for checking. That's the reason pdfplumber missed reading that text. 2. I usually do it by finding the bounding boxes of the characters that weren't in the proper...

@sreeni5493 I was able to distinguish between the dotted and non-dotted lines using the `stroking_color` property of an edge. For dotted lines, the `stroking_color` is `[1]` ```python im.draw_lines([e for e...

This should give you an idea of how to proceed with distinguishing between different types of lines. You may also try saving all the objects as a CSV using https://github.com/jsvine/pdfplumber#basic-example...

Hi @jsvine This is very insightful. Yes, it could be that the font name was in a non-UTF-8 format, say UTF-16. I also tried repairing the PDF using Ghostscript but...

Hi @colemanr03 Appreciate your interest in the library. Could you please provide more details like the version of pdfplumber you are using, the PDF (redacting any sensitive information) that is...

Hi @ozlem-atiz This repo is not designed for recognising continuous handwritten text (feel free to raise a PR that adds support for it). If your text is not continuous or...

@micmalti I was able to resolve this issue by repairing the PDF via Ghostscript. Command I ran: ``` gs -o "output.pdf" -sDEVICE=pdfwrite input.pdf ``` [The repaired PDF.](https://github.com/pdfminer/pdfminer.six/files/5232199/ttt1.pdf) Is this something...

I too encountered the same. Have fixed in #7

@umihico Is it safe to flush the temporary storage? Since AWS Lambda reuses the storage, and multiple invocations are running in parallel, wouldn't this cause unexpected issues? For anyone facing...