LANraragi icon indicating copy to clipboard operation
LANraragi copied to clipboard

Implement upload API

Open psilabs-dev opened this issue 1 year ago • 4 comments

Implementation of an archive upload API, that uploads the content of an archive and all metadata, if any, to the LRR server. (#1005)

While it is possible to write a LRR plugin for a website W, it is often the case to use dedicated, feature-rich, and actively maintained downloaders (e.g. PixivUtil2 for Pixiv) that specifically pulls metadata/archives from W. However, moving downloaded content from these downloaders to LRR is still a human process. In the case of PixivUtil2, metadata is stored in a sqlite db, and the storage/language/architecture is different for each downloader.

This API implementation (using multipart/form-data encoding) allows users to perform language-agnostic archive migrations or mirrors from their existing archive servers and downloaders to the LRR server via the API, or write comparatively thin HTTP client wrappers around their downloaders that negotiate with and push to LRR during that application's runtime. The user may wish to attach the SHA1 hash of the file to check for file integrity and avoid corruptions during transit.

PUT /api/archives/upload

Return values:

  • 200 success
  • 400 no file attached
  • 409 duplicate
  • 415 unsupported file extension
  • 422 checksum mismatch
  • 500 internal server error

Include in body: file content (binary), filename, title (optional), summary (optional), tags (optional), category ID (optional), plugin calls (?)

Example of a PUT /api/archives/upload in Python with the "example.zip" archive:

import hashlib
import requests

def calculate_sha1(file_path):
    sha1 = hashlib.sha1()
    with open(file_path, 'rb') as fb:
        while chunk := fb.read(8192):
            sha1.update(chunk)
    return sha1.hexdigest()

file_path = "example.zip" # required
file_name = "example.zip" # required, this will be where the file is stored in the server
upload_mime = "application/zip"

# metadata
title = "example title"
tags = "example_date,key:value,artist:example artist"
summary = "example summary"
category_id = None

# calculate checksum (optional)
file_checksum = calculate_sha1(file_path)

data = {
    "title": title,
    "tags": tags,
    "summary": summary,
    "file_checksum": file_checksum
}

with open(file_path, 'rb') as fb:
    files = {'file': (file_name, fb, upload_mime)}
    response = requests.put(
        "http://localhost:3000/api/archives/upload",
        files=files,
        data=data
    )
    print(response.json())

This is a synchronous API implementation. Theoretically, we can also run this API as a Minion job but this is not a long-running task and doesn't justify creating a job, also error handling is more streamlined and the API should not take too many requests to begin with (TBD).

Behavior notes:

  • if the upload API is called when a metadata plugin is running automatically, metadata from the plugin will override (or append to) the metadata provided in the API data.
  • only one upload can be executed at a time.

psilabs-dev avatar Oct 05 '24 23:10 psilabs-dev