couchdb icon indicating copy to clipboard operation
couchdb copied to clipboard

Improve CouchDB's understanding of multipart/related PUT requests

Open flimzy opened this issue 7 years ago • 13 comments

At present, to send a multipart/related PUT request to CouchDB, each attachment's length must be known beforehand, to create a JSON body such as:

{
  "_id":"doc_id",
  "_attachments":{
    "foo.txt": {
      "content_type": "text/plain",
      "following": true,
      "length": 13
    }
  }
}

This defeats part of the benefit of using multipart/related to send attachments.

It would be nice if the length field could be omitted from the attachment stub, and instead inferred from the actual size of the respective part.

This would make it possible to write more efficient client code.

flimzy avatar Dec 04 '17 19:12 flimzy

That would be good to have. Found an older Jira ticket about it:

https://issues.apache.org/jira/browse/COUCHDB-1956

And an associated PR https://github.com/apache/couchdb/pull/138 that was never merged.

nickva avatar Dec 04 '17 23:12 nickva

The client sending the request knows the length of each file, so you will have that info. ...That's not to say you're going to get multipart/related to work, but you can enjoy trying.

ronnievsmith avatar Jun 19 '23 21:06 ronnievsmith

The client sending the request knows the length of each file

Not necissarily.

That's not to say you're going to get multipart/related to work, but you can enjoy trying.

I have it working. It just requires the non-optimal overhead of calculating that size, and for all attachments, before sending any data. In some cases it's just a nusiance. In other cases, it means buffering huge amounts of data before sending anything to the server.

flimzy avatar Jun 20 '23 09:06 flimzy

In what case / scenario are file sizes not known by the source/client machine?

Do you have this working in JavaScript? If so, any chance you could share your code here?

ronnievsmith avatar Jun 20 '23 14:06 ronnievsmith

In what case / scenario are file sizes not known by the source/client machine?

At the risk of soundind trite, any time the source is not a raw disk file. A common example would be reading an HTTP POST or PUT request. But the file size will not be known prior to reading the file in many other scenarios, as well, such as reading from an archive or compressed file. It can happen when generating the file on the fly, such as when converting between formats, or creating a dump file.

flimzy avatar Jun 20 '23 14:06 flimzy

If you're reading an HTTP POST/PUT you're acting as a proxy in front of CouchDB. The machine that sent that HTTP request knows the file length / type and that's where you would want to build out the multipart/mixed body - on the client. I'm not sure I follow regarding streaming multiple files in a single HTTP PUT to Couch.

ronnievsmith avatar Jun 20 '23 14:06 ronnievsmith

The machine that sent that HTTP request knows the file length

Again, not necissarily.

I gather that you've not done a lot of stream-based programming over HTTP or otherwise.

flimzy avatar Jun 20 '23 15:06 flimzy

@ronnieroyston I'm curious the point you're trying to make here? Is it just that you don't see a need for this feature?

flimzy avatar Jun 20 '23 15:06 flimzy

Multipart transfer (like multipart/related) lets you upload metadata and data in the same request and is used when the total payload is small enough to upload again, in its entirety, if the connection fails.

Are you looking for HTTP 1.1 resumable upload support?

ronnievsmith avatar Jun 20 '23 20:06 ronnievsmith

No. I'm looking for exactly what the issue describes. The ability to use multipart/related without specifying a file size for each attachment in the JSON preamble.

I'm still unsure what point you're trying to make with these comments.

flimzy avatar Jun 21 '23 07:06 flimzy

I'm building out a Node.js proxy for CouchDB to add Authentication, Authorization, and Accounting (AAA) / role-based controls for browser-based applications (via JWTs). It's basically done but I got to the part where I'm implementing browser uploading multiple files at once and stumbled across this multipart/related + JSON request body, and this issue.

At this point my main interest is in your code. You said you had multiple file uploads via this mechanism working.

Is your code in JavaScript? If so, is it published on GitHub, or can you share it?

Have you noticed that their Fauxton (web admin tool) does not support multi file uploads?

Would you mind sharing your code?

ronnievsmith avatar Jun 22 '23 04:06 ronnievsmith

No, my code is in Go. But you are of course welcome to see it. The core logic is here: https://github.com/go-kivik/couchdb/blob/3d79fc7e4318319774faa359810f07fda77577ca/db.go#L448

flimzy avatar Jun 22 '23 07:06 flimzy

If you're interested https://github.com/ronnieroyston/couch-behind-node

ronnievsmith avatar Jun 25 '23 04:06 ronnievsmith