inception icon indicating copy to clipboard operation
inception copied to clipboard

Improved API for document search and export

Open tpluscode opened this issue 1 year ago • 0 comments

Excuse a single ticket for multiple related problems. I'll create separate if you'd prefer that.

Is your feature request related to a problem? Please describe.

I enabled API access to programmatically export documents. I found that I can retrieve project documents and then export annotations in the desired format. However, there are some shortcomings:

  1. The /api/aero/v1/projects/{projectId}/documents endpoint returns all documents, requiring filtering on client side, for example to get only state=CURATION-COMPLETE
  2. Documents can only be exported one-by-one with the /api/aero/v1/projects/{projectId}/documents/{documentId} endpoint
  3. When exporting, the response is always Content-Type: application/octet-stream. Additionally, Accept header causes status 406 Not Acceptable, even if a matching media type is requested. These I find a bug

Describe the solution you'd like

Ideally, it would be possible to directly export all matching documents, without doing a search first. Something like

GET /api/aero/v1/projects/{projectId}/documents{?format,state}

The response could be a ZIP with each document exported in the chosen format

Describe alternatives you've considered

If a new endpoint is not feasible, it would be nice to introduce some improvements

  1. Add ?state query param to document search endpoint
  2. Respond with matching content type, such as rdfcas => text/turtle, conllu => text/plain, jsoncas => application/json, etc
  3. (Optionally) Allow content negotiation of RDF formats. For example, requesting with Accept: application/n-triples should be honoured and respond with RDF in n-triples format

tpluscode avatar May 05 '24 11:05 tpluscode