deltachat-core-rust icon indicating copy to clipboard operation
deltachat-core-rust copied to clipboard

Prototype dc_blob_t API

Open link2xt opened this issue 2 years ago • 9 comments

link2xt avatar Jan 01 '22 00:01 link2xt

i was more thinking about letting dc_blob_t handling a whole memory block, so instead of copying data, one could just return a const char* buffer.

at least for webxdc (aka w30), that seems to be easier. i did this already without a dc_blob_t at https://github.com/deltachat/deltachat-core-rust/pull/2826/files#diff-192a66af39de75ebe0319656e39b2f45c5738d5e5fef1e97eeb6567971a55d44R3035v - however, having a dc_blob_t returning a const char* could save one copy process.

webxdc could also pass ZipFile to dc_blob_t somehow and read as needed, that way a copying would also be minimal, however, that would be more work and probably not urgently needed as files are needed "at whole" anyway.

so, if dc_blob_t takes just a memory block, that would be easier for different implementations. or we probably need two cases for read-from-db and read-from-db.

r10s avatar Jan 01 '22 16:01 r10s

Directly returning a char* instead of copying into a void* buffer doesn't necessarily save a copy, once the blobs are in the database we'd have to copy them out of it anyway. But even if they don't we have to copy the bytes from the filesystem to in the memory, unless we use mmap (which we probably shouldn't).

dc_blob_t* blob = dc_msg_get_fileblob(msg);
if (blob == NULL)
    goto fail;
usize_t size = dc_blob_bytes(blob);
void* buf = malloc(size);
if (buf == NULL)
    goto fail;
if !dc_blob_read(blob, buf, size, 0)
    goto fail;
dc_blob_unref(blob);

vs (IIUC)

size_t size;
char* blob = dc_msg_get_fileblob(msg, &size);
if (blob == NULL)
    goto fail;
// do stuff
dc_blob_free(blob)

I think the second is fine in simple cases, but once you do not want the entire file in memory things get less fun:

// signature:
// char* dc_msg_get_fileblob(dc_msg_t* msg, size_t max_size, off_t offset, size_t *bytes)
size_t size;
char* blob = dc_msg_get_fileblob(msg, 9999, 0, &size);
if (blob == NULL)
    goto fail;
size_t actual_size = dc_blob_bytes(blob);
if (actual_size != size)
    goto fail;

e.g. you can no longer re-use the same buffer for reading multiple chunks and things like that.

flub avatar Jan 01 '22 17:01 flub

unless we use mmap (which we probably shouldn't)

You can't even mmap an SQLite blob, there is no API to do it: https://www.sqlite.org/c3ref/blob.html

link2xt avatar Jan 01 '22 17:01 link2xt

i was more thinking about letting dc_blob_t handling a whole memory block, so instead of copying data, one could just return a const char* buffer.

This API is mostly for media files, e.g. video files received in messages.

Maybe it's not needed at all and we can simply assume that files are small enough to be handled entirely in memory. Then we need another simpler API to wrap char * with size and custom unref to ensure the same malloc is used. Something like dc_buffer_t. And implementation of dc_blob_t can be postponed indefinitely, at least until DC supports sending large files as multiple fragments.

link2xt avatar Jan 01 '22 17:01 link2xt

k, it's is fine for me, we can add and implement that as needed for database-files.

Maybe it's not needed at all and we can simply assume that files are small enough to be handled entirely in memory.

yip, this is what we're doing with w30 / webxdc for now, and that is probably just fine, esp. when we keep in mind that we encourage the .xdc files to be small.

r10s avatar Jan 01 '22 17:01 r10s

So in json representation of the chat for example there will be and boolean wether it exists and if it does the ui loads the image for that chat_id?

Or will there be a filename / id api that is closer to the path we currently have?

Simon-Laux avatar Jan 02 '22 18:01 Simon-Laux

@Simon-Laux Sorry, don't understand what you are talking about. "there will be and boolean" - there will be what? Some word is missing. This PR is about moving media files to the database and providing C API to read them, so they will not have filenames eventually.

JSON representation of the chat? Is it about representation inside web app or dc_cmd_api JSON API?

link2xt avatar Jan 02 '22 18:01 link2xt

yeah its about the json representation. So we will still some-kind of api call that tells us that a blob exists for a given chat/contact/message?

Simon-Laux avatar Jan 02 '22 20:01 Simon-Laux

Long-term plan is to have API

dc_blob_t *dc_msg_get_blob(const dc_msg_t *msg);

replacing dc_msg_get_file.

But dc_msg_get_file is not going away, it will still unpack the blob from the database to temporary storage and return filename for compatibility.

link2xt avatar Jan 02 '22 22:01 link2xt

Moving this to project resurrection. Before we do this we will likely add a simple PUT/GET API to JSON-RPC that can be used with bots without accessing the filesystem, and then can copy it to C API if needed. It's unclear whether an ability to read blobs with offsets is needed for the file sizes we have.

link2xt avatar Dec 04 '22 18:12 link2xt