core icon indicating copy to clipboard operation
core copied to clipboard

Batch upload of documents

Open dave90 opened this issue 6 months ago • 0 comments

Description

This pull request introduces a new endpoint to facilitate the batch uploading of files. The endpoint allows users to upload multiple files. The response of the endpoint returns a dictionary containing the status of each uploaded file. Metadata for each file is passed as a JSON string alongside form data. This new endpoint does not affect the existing endpoint or any current functionality.

Example of the usage:

files = []
files_to_upload = {"sample.pdf":"application/pdf","sample.txt":"application/txt"}

for file_name in files_to_upload:
    content_type = files_to_upload[file_name]
    file_path = f"tests/mocks/{file_name}"
    files.append(  ("files", ((file_name, open(file_path, "rb"), content_type))) )


metadata = {
    "sample.pdf":{
        "source": "sample.pdf",
        "title": "Test title",
        "author": "Test author",
        "year": 2020
    },
    "sample.txt":{
        "source": "sample.txt",
        "title": "Test title",
        "author": "Test author",
        "year": 2021
    }
}
    
# upload file endpoint only accepts form-encoded data
payload = {
    "chunk_size": 128,
    "metadata": json.dumps(metadata)
}

response = requests.post(
    "http://localhost:1865/rabbithole/batch",
    files=files,
    data=payload
)

Related to issue #871

Type of change

  • [ ] Bug fix (non-breaking change which fixes an issue)
  • [x] New feature (non-breaking change which adds functionality)
  • [ ] Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • [ ] This change requires a documentation update

Checklist:

  • [x] My code follows the style guidelines of this project
  • [x] I have performed a self-review of my own code
  • [x] I have commented my code, particularly in hard-to-understand areas

dave90 avatar Aug 07 '24 19:08 dave90