tinycbor icon indicating copy to clipboard operation
tinycbor copied to clipboard

Feature: parser function for string access by pointer

Open bergzand opened this issue 7 years ago • 14 comments
trafficstars

I have a few use cases where I have a CBOR stream with a rather large byte strings. Having a function to get a pointer and length to these byte string would help saving some memory as a buffer to temporarily copy the data to is not required. Something like

CborError cbor_value_get_byte_string(CborValue *value, uint8_t **result, size_t *len);

would be perfect. This could return an CborErrorUnknownLength when a chunked string is passed.

My main use case is a COSE library where I want to retrieve the payload without having to copy the to another buffer. Simply accessing it with a pointer is preferred to save memory on a constraint device.

I'm willing to contribute the code myself, but I need a few pointers on how to get the pointer to the actual data.

bergzand avatar May 09 '18 08:05 bergzand

Hello Koen

I had that exact function in 0.5 but I removed it for further development. I un-reverted it in the dev branch, in edd2b9e1fd2844bac22e1b850ccedee7a15c8905. Please note that I do plan on changing it again, see 8b539f233fd608042e6da921f0924c4ec141f678.

thiagomacieira avatar May 09 '18 15:05 thiagomacieira

The problem I have to solve is that "pointer to the actual data" may not be a simple thing, especially in constrained devices. Take the example of Zephyr: instead of one buffer containing all data (a "linear" buffer), the data received from the network is stored in a struct netbuf, which is a linked (chained) list of 128-byte chunks. So we need to return to the caller not a pointer to the beginning of the data, but instead a struct netbuf plus offset.

thiagomacieira avatar May 09 '18 15:05 thiagomacieira

@thiagomacieira Thanks for the clarification! I understand that my request might not be as simple as I've thought it to be :)

Please let me know if I can help you in some way with testing.

bergzand avatar May 10 '18 09:05 bergzand

I'd second this FR, for the non-stream parsing use case.

The library is great for stream parsing, but for the case where I have the entire CBOR object available, extracting the byte/text strings comes with a penalty of an extra copy or allocation plus copy.

When I have the entire object, the string is in a contiguous chunk and readily available, so I only need to know where it starts and its length, which the decoder knows of already. An attempt to get the ref should of course fail if the string is chunked (in which case the caller still has the option of having the library allocate/copy).

bpintea avatar Jul 12 '19 10:07 bpintea

That is true and the API I designed works for when the entire message is available in a contiguous chunk. It's not so good when that isn't the case. I've managed to make it work with an arbtrary Qt's QIODevice and avoid double-copy, but the API is fragile. We need a better API.

thiagomacieira avatar Jul 15 '19 14:07 thiagomacieira

any news with respect to this? would it be possible to expose: CborError get_string_chunk(CborValue *it, const void **bufferptr, size_t *len)in cbor.h?

This could be a good workaround AFAIK and would allow direct access to a pointer to the start of the string.

jleni avatar Dec 17 '19 17:12 jleni

The problem is committing to an API that isn't long-term supportable, especially on small, RTOSes where the buffer may not be a contiguous chunk of memory.

thiagomacieira avatar Dec 17 '19 18:12 thiagomacieira

yes, I understand that.. but enforcing copies is even worse for small devices.. maybe get_string_chunk under conditional compilation for people that want that option would be useful.

jleni avatar Dec 17 '19 19:12 jleni

I know. Adding something under conditional compilation is the same as you performing the hack: I won't support you if it breaks later when we have a stable API.

thiagomacieira avatar Dec 17 '19 22:12 thiagomacieira

I'm also very keen to see this functionality (for me, direct pointer access is not necessary but memory must be stack/static only).

Can I suggest a string iterator which allows you to read chunks of a string up to a maximum length? For example:

// pretend that an iterator 'it' already exists
char small_buffer[32];
CborStringIterator string_it;
size_t actual_chunk_size = 0;

cbor_create_string_iterator(&it, &string_it); // 'it' currently points to a text string or byte string
while (!cbor_string_iterator_at_end(&string_it)) {
    cbor_string_iterator_copy_and_advance(&string_it, small_buffer, 32, &actual_chunk_size);
    
    for (size_t i=0; i<actual_chunk_size; i++) {
        // 0 <= actual chunk size <= 32
        do_some_work(small_buffer[i]);
    }
}

This way the alignment of any chunks is not relevant; the copy function can return any number of bytes up to a maximum.

jeremyherbert avatar Jan 04 '20 08:01 jeremyherbert

It is a good idea but I would open another feature request for that case. I think the current feature request is specific about having access to a direct pointer to the CBOR buffer to avoid any kind of copies or additional memory usage. I find this specific use case very important in memory-constrained embedded environments.

jleni avatar Jan 04 '20 12:01 jleni

Please open a new request for that, but please explain how cbor_value_copy_{text,byte}_string won't work for you.

thiagomacieira avatar Jan 09 '20 04:01 thiagomacieira

I'm also looking for direct access to a byte string pointer with definite length. I would have thought this is a common use case somehow.

letmaik avatar Jun 21 '21 10:06 letmaik

Somewhat. I did hack around that issue for my own code...

thiagomacieira avatar Jun 21 '21 15:06 thiagomacieira