trt-llm-rag-windows
trt-llm-rag-windows copied to clipboard
Enhancement: Support for reading PDFs with partial DRM (AES) - include PyCryptodome dependency
Description When attempting to read PDF files that have partial DRM capabilities (e.g., Printing, Content Copying, and Content Copying for Accessibility allowed), the operation fails when reading local files with the following error message: "Failed to load file <filename.pdf> with error: PyCryptodome is required for AES algorithm. Skipping..." This issue arises due to the absence of the PyCryptodome library, which is necessary for handling AES encryption used by these DRM features.
Expected Behavior The expected behavior is that the project should be able to read PDF files, including those with partial DRM capabilities, without throwing errors related to the absence of cryptographic support. Users should be able to process such PDFs for legitimate use cases, such as reading text for accessibility purposes, where the use complies with the DRM's allowances. Note if there is a restriction that would prevent reading the file, an error should still be thrown stating that the necessary DRM permissions do not allow reading of this document.
Actual Behavior The actual behavior is that when attempting to read a PDF with partial DRM capabilities, the process is aborted due to the missing PyCryptodome dependency, and the file cannot be read or processed further.
Steps to Reproduce Attempt to read a PDF file with partial DRM capabilities using the project. Observe the error message indicating the absence of PyCryptodome for AES algorithm support.
Suggested Enhancement To resolve this issue and enhance the capability to read a wider range of PDF files, suggest including PyCryptodome as a dependency/requirement within the project's Python implementation.
Additional Context The ability to read PDFs with partial DRM is crucial for various legitimate use cases, including accessibility and content analysis, where the user is not infringing on the copyright or DRM protections but merely accessing the content in a manner that the DRM allows (e.g., reading for visually impaired users), or where legal and necessary references are provided in their document.
I'm facing this problem too!
Same situation and I agree with the statement in the "Additional Context" section above.
@montge thanks, we will consider this feature request.
@montge any ETA for this requirement?