import-paperless script fails due to mixed API paths (/api vs /app/api) and async processing in Docspell 0.43

Open voyager opened this issue 3 months ago • 1 comments

Hi Docspell team,

I'm migrating from Paperless using tools/import-paperless.sh (originally written for Docspell 0.3 beta) on a UGREEN NAS (Docker Compose) and encountered several issues with the current import script:

Mixed API base paths

Open auth endpoint works at root: POST http://localhost:7880/api/v1/open/auth/login → 200 OK
All secured endpoints require /app prefix: http://localhost:7880/app/api/v1/sec/...
Using /api/v1/sec/... (without /app) → 404 Not found
Conclusion: Login uses /api/v1/open/auth/login, all secured endpoints must use /app/api/v1/sec/...

Asynchronous document processing not handled

Upload endpoint /app/api/v1/sec/upload/item returns success immediately but documents are queued for processing
The script attempts to set metadata immediately after upload, but documents don't have IDs yet
Processing can take 15-30+ minutes per document (OCR, NLP analysis, etc.)
Solution: Implemented two-pass approach:
- Pass 1 (mode=upload): Upload all documents quickly
- Wait for processing queue to complete
- Pass 2 (mode=metadata): Apply all metadata using /checkfile/{checksum} to get document IDs

Upload payload format

The script was using file=@path but web UI uses file[]=@path with specific metadata JSON
Required metadata structure: {"multiple":true,"flattenArchives":false,"direction":"incoming","folder":null,"skipDuplicates":true,"tags":null,"fileFilter":null,"language":null,"attachmentsOnly":null,"customData":null}

Shell quoting issues

Original script had nested quote problems and missing fields (e.g., "use":"correspondent" for organizations)
Fixed by using jq -n to build JSON payloads safely

Environment

Platform: UGREEN NAS (Docker Compose)
Docspell version: 0.43.0
UI base: http://localhost:7880/app/dashboard
Auth endpoint: /api/v1/open/auth/login
Secured endpoints: /app/api/v1/sec/...

Working solution summary

Authentication:

payload=$(printf '{"account":"%s","password":"%s"}' "$user" "$password")
curl -s -X POST -H 'Content-Type: application/json' \
  -d "$payload" "http://localhost:7880/api/v1/open/auth/login"

Organization create (with required "use" field):

payload=$(jq -n --arg name "$org_name" \
  '{id: "", name: $name, address: {street: "", zip: "", city: "", country: ""}, contacts: [], notes: null, 
created: 0, shortName: null, use: "correspondent"}')
curl -s -X POST -H "X-Docspell-Auth: $token" \
  -H 'Content-Type: application/json' \
  -d "$payload" "http://localhost:7880/app/api/v1/sec/organization"

Document upload (matching web UI format):

meta_json='{"multiple":true,"flattenArchives":false,"direction":"incoming","folder":null,"skipDuplicates":true,"tags":null,"fileFilter":null,"language":null,"attachmentsOnly":null,"customData":null}'
curl -s -X POST -H "X-Docspell-Auth: $token" \
  -F "meta=$meta_json" \
  -F "file[]=@$filepath" \
  "http://localhost:7880/app/api/v1/sec/upload/item"

Check processing status:

  curl -s -H "X-Docspell-Auth: $token" \
    "http://localhost:7880/app/api/v1/sec/checkfile/$checksum"

Returns: {"exists":false} while processing, {"exists":true,"items":[{"id":"..."}]} when done.

Results

I successfully imported 51 files end‑to‑end using the two‑pass approach (upload first, then metadata after processing) with the following commands:

Pass 1: upload (fast, queues processing)

./import-paperless.sh \
  http://localhost:7880 \
  your_user \
  'YOUR_PASSWORD' \
  /home/user/paperless/data/db.sqlite3 \
  /home/user/paperless/media/documents/originals \
  upload

Pass 2: metadata (after processing finishes)

./import-paperless.sh \
  http://localhost:7880 \
  your_user \
  'YOUR_PASSWORD' \
  /home/user/paperless/data/db.sqlite3 \
  /home/user/paperless/media/documents/originals \
  metadata

Note: This run was orchestrated with Claude Code; the core changes were aligning paths (auth at /api/v1/open/auth/login, secured endpoints under /app/api/v1/sec/…), switching JSON payloads to printf/jq -n, fixing upload format to file[] with proper metadata JSON, and deferring metadata until items had IDs via /checkfile/{checksum}. Also, I did disable my 2FA in the process, to make things simpler.

Request

Please clarify if the /api vs /app/api split is intended behavior when UI is served from /app
Consider updating import-paperless.sh to: a. Support two-pass mode (upload, then metadata after processing) b. Use correct API paths consistently c. Include required fields like "use":"correspondent" for organizations d. Use proper upload format (file[] with metadata JSON) e. Build JSON via jq or printf to avoid quoting issues f. Handle missing bc command (use $((seconds * 1000)) instead)

Attached is the modified version of totti4ever's original import-paperless.sh that successfully imported the documents with full metadata from Paperless-ngx v2.18.4.

Thanks for the excellent project!

import-paperless.sh

Oct 06 '25 01:10 voyager

Hi @voyager Thank you for sharing your work! The script was 5 years old so this was probably some effort! 💪🏼 I myself don't run paperless and won't have time to look into it. If you want, you can make a PR (just paste your text into the readme so people have some info how to use it). The tools section is really a best-effort thing and I don't expect these scripts to work without taking a look into them in general. They are great if they just work :-) and also when being a starting point for the next person.

Regarding the api paths: This is new to me. There is no separation of these paths. The /app path is always only for the ui and all api calls are behind /api. This is how ui and api are separated. I just checked my installation and the browser's network tab is doing it like this. whenever I use /app/api/* (or /app/* in general), I get the html page - I never get an api response. So I'm not sure why you observed this - really strange to me!

Oct 22 '25 11:10 eikek