Sambit Kumar Dash
Sambit Kumar Dash
@hhaensel thank you for your interest. I want to understand what level of complex cases can this software handle. If you submit a PR, I can review it and let...
The file is corrupt. The PDF file must start with %PDF and end with %%EOF. While some readers take a lenient stand on it, one cannot say that is the...
These files are not according to the PDF spec. So technically, the behavior of a parser on corrupt files cannot be guaranteed and should not be fixed in a hurry....
@aminya thanks a lot for your interest in `PDFIO` project. While I will love to see the project become a PDF writer, I personally cannot devote any time on it...
@aminya making `PDFIO` part of `FileIO` interface which is most JuliaIO projects are based on have its own challenges. Most `FileIO` projects load a file and read it from beginning...
`PDFIO` is a little low level API than Taro in this respect. It deals with PDF each page separately. So you may need a few extra lines of code. The...
Not as an API. However, it's not hard to implement or extend `pdPageExtractText` for these purposes. If you plan to submit a PR, please feel free to do so.
`pdPageEvalContent` is essentially the method to evaluate the content stream stack and populates intermittent values to the graphic state the stack. This stack / state is called `GState`. You have...
You can initialize `:clipping_rect` in `pdPageExtractText` You can go to this location: https://github.com/sambitdash/PDFIO.jl/blob/95000b69625cfbd51cf7825470def0d4df9192aa/src/PDPageElement.jl#L653 This code will for example exclude all Italic fonts. ``` if !get(state, :in_artifact, false) && !pdFontIsItalic(font) tl...
Now that it worked, you can make a modification to the `pdPageExtractText` which can take a clipping rectangle path as input or certain font characteristics as input parameter and submit...