Stirling-PDF icon indicating copy to clipboard operation
Stirling-PDF copied to clipboard

Uploading Multiple Files as ArrayBuffer

Open lawfulsoftware opened this issue 10 months ago • 4 comments

This is a feature request.

I am unable to upload multiple files using n8n. Because of the way it handles binary data, I cannot find a way to send multiple files as a binary array (e.g., using the merge-pdfs endpoint). This could also be an issue in a serverless environment.

Is it possible to modify or enhance the methods for passing files to Stirling-PDF?

Possible Strategies

URL Array

The user can send an array of URLs so Stirling-PDF can download the PDFs that need to be processed.

This approach allows for quicker transmission of the API call. If the URL is well-formed, Stirling-PDF can accept the payload. It avoids sending a single large payload (e.g., 75 MB) that can fail during transmission.

Stirling-PDF can download the files in parallel whereas a single binary array is single-threaded. Stirling-PDF can also retry any failed downloads without losing the other files that it has already downloaded.

Task Approach

An API endpoint can allow the user to declare a taskId (e.g., a uuid) and associate uploaded files with that taskId. The user can then reference the taskId in calls to the various API endpoints and the endpoint will process all files associated with that taskId.

Optionally, the user can declare the number of files that will be sent for this taskId such that Stirling-PDF can return an error if it has not yet received the required number of files along with an array of the filenames that it has already received.

ZIP Approach

Receive a ZIP file containing all of the PDFs that need to be processed.

Ingest Endpoint

Rather that each endpoint having to handle receiving data, a single API endpoint could ingest data using one or more of the above approaches. This would consolidate the process which may simplify the application logic and enhance maintainability. Optionally, the user could set a TTL to override any system defaults.

Data ingested by this single endpoint the files can be used by multiple API endpoints without Stirling-PDF having to receive the same data for each API call. To achieve this objective, the user could reference the taskId when calling the Pipeline API.

n8n References:

n8n Data Structure n8n Binary Data

lawfulsoftware avatar Apr 03 '24 19:04 lawfulsoftware