Sambit Kumar Dash issues

Results 22 issues of


                                            Sambit Kumar Dash

Helper methods for extracting RNN final state in a GPU compatible way

In case of autoencoder designs one may need to extract the final state of RNN and populate on a repeated grid to pass to the decoder: ``` repeat_states(x) = repeat(x[:,end],...

Improve the performance `pdPageExtractText`

`pdPageExtractText` API is one the core APIs of PDFIO. However, smaller large number of allocations make it a bit slower. This code needs to be refactored to ensure the text...

performance

Table picker for PDF

Natural tabular objects in a PDF document should ideally be picked up for extraction. The intent of the project is API development, hence it will be headless for most part....

enhancement

`pdPageExtractText` should support multi-column documents

This implementation may be needed to be reviewed along with #2. Although, there may not be an exact overlap in some cases the implementation logic can be similar.

enhancement

Support for JPEG filter

Content filter for JPEG and JPEG2000 should be supported. Since, these are special type filters whether decoding over direct streaming into the graphics channel for rendering should be reviewed.

enhancement

Normalize with SASLPrep for PDF passwords

SASLPrep can be implemented using the Unicode consortium supplied libraries: http://site.icu-project.org/ but I guess this may be unnecessarily added dependency. Enhancement request has been raised to include the feature in...

Implement Filespec properly to address the EFF attribute of the security handler

The crypto code decrypts the streams through recursively accessing the indirect objects. For external files it may not easy to determine a file stream is an embedded file from the...

Better support for T3 fonts

Some of the PDF files support T3 fonts that do not have embedded toUnicode mapping. Such fonts cannot be extracted from the document effectively. In such cases, usage of OCR...

enhancement

Full PostScript parser / execution engine for font files and CMap reading

May be picked up from a PostScript renderer like Cairo project as well. Currently, `Cairo.jl` does not expose such low level APIs.

enhancement