trt-llm-rag-windows icon indicating copy to clipboard operation
trt-llm-rag-windows copied to clipboard

Enhancement: Support for reading PDFs with partial DRM (AES) - include PyCryptodome dependency

Open montge opened this issue 1 year ago • 4 comments

Description When attempting to read PDF files that have partial DRM capabilities (e.g., Printing, Content Copying, and Content Copying for Accessibility allowed), the operation fails when reading local files with the following error message: "Failed to load file <filename.pdf> with error: PyCryptodome is required for AES algorithm. Skipping..." This issue arises due to the absence of the PyCryptodome library, which is necessary for handling AES encryption used by these DRM features.

Expected Behavior The expected behavior is that the project should be able to read PDF files, including those with partial DRM capabilities, without throwing errors related to the absence of cryptographic support. Users should be able to process such PDFs for legitimate use cases, such as reading text for accessibility purposes, where the use complies with the DRM's allowances. Note if there is a restriction that would prevent reading the file, an error should still be thrown stating that the necessary DRM permissions do not allow reading of this document.

Actual Behavior The actual behavior is that when attempting to read a PDF with partial DRM capabilities, the process is aborted due to the missing PyCryptodome dependency, and the file cannot be read or processed further.

Steps to Reproduce Attempt to read a PDF file with partial DRM capabilities using the project. Observe the error message indicating the absence of PyCryptodome for AES algorithm support.

Suggested Enhancement To resolve this issue and enhance the capability to read a wider range of PDF files, suggest including PyCryptodome as a dependency/requirement within the project's Python implementation.

Additional Context The ability to read PDFs with partial DRM is crucial for various legitimate use cases, including accessibility and content analysis, where the user is not infringing on the copyright or DRM protections but merely accessing the content in a manner that the DRM allows (e.g., reading for visually impaired users), or where legal and necessary references are provided in their document.

montge avatar Feb 16 '24 02:02 montge

I'm facing this problem too!

Jason-XII avatar Feb 16 '24 04:02 Jason-XII

Same situation and I agree with the statement in the "Additional Context" section above.

dayjobtitus avatar Feb 19 '24 21:02 dayjobtitus

@montge thanks, we will consider this feature request.

kedarpotdar-nv avatar Feb 26 '24 07:02 kedarpotdar-nv

@montge any ETA for this requirement?

sarsharoid avatar Feb 27 '24 23:02 sarsharoid