Sambit Kumar Dash
Sambit Kumar Dash
In case of autoencoder designs one may need to extract the final state of RNN and populate on a repeated grid to pass to the decoder: ``` repeat_states(x) = repeat(x[:,end],...
`pdPageExtractText` API is one the core APIs of PDFIO. However, smaller large number of allocations make it a bit slower. This code needs to be refactored to ensure the text...
Natural tabular objects in a PDF document should ideally be picked up for extraction. The intent of the project is API development, hence it will be headless for most part....
This implementation may be needed to be reviewed along with #2. Although, there may not be an exact overlap in some cases the implementation logic can be similar.
Content filter for JPEG and JPEG2000 should be supported. Since, these are special type filters whether decoding over direct streaming into the graphics channel for rendering should be reviewed.
SASLPrep can be implemented using the Unicode consortium supplied libraries: http://site.icu-project.org/ but I guess this may be unnecessarily added dependency. Enhancement request has been raised to include the feature in...
The crypto code decrypts the streams through recursively accessing the indirect objects. For external files it may not easy to determine a file stream is an embedded file from the...
Some of the PDF files support T3 fonts that do not have embedded toUnicode mapping. Such fonts cannot be extracted from the document effectively. In such cases, usage of OCR...
May be picked up from a PostScript renderer like Cairo project as well. Currently, `Cairo.jl` does not expose such low level APIs.