CRoaring
CRoaring copied to clipboard
Exposing the roaring64_iterator_t struct in the roaring64 header
I wanted to see if there's any chance of moving the roaring64_iterator_s definition from roaring64.c to the header. My reasoning for this is that the original roaring.h had everything exposed which allowed for referencing/dereferencing certain values. I was trying to update some code to allow for similar functionality with the option of using either 32 or 64bits and everything was available except this.
@nardyy01 could you expand on your use-case? Keeping the definition hidden allows for future changes to the struct without breaking users, so this change would need to have appropriate motivation.
Would having it in the header here be much different from roaring.h (with warning comments of only accessing value)? I am modifying company code, so I need to use the submodule for proper access to the bitmaps.
As far as use-case, we handle large datasets that are often compressed and stored in multi-dimensional containers. In most areas we are passing around references while processing the data in chunks to keep up the performance -- maintaining access to the reference allows for quick access and reduced calls overall.
Would having it in the header here be much different from roaring.h (with warning comments of only accessing value)?
Can you be specific? There is no roaring_iterator_t or roaring_iterator_s, right?
As far as use-case, we handle large datasets that are often compressed and stored in multi-dimensional containers. In most areas we are passing around references while processing the data in chunks to keep up the performance -- maintaining access to the reference allows for quick access and reduced calls overall.
So what you want to do it capture the iterator by value rather than as a pointer?
This is what it looks like in roaring.h currently -- but for 64 bits the definition is in roaring64.c
Obviously, if you modify the underlying bitmap, the iterator
becomes invalid. So don't.
*/
/**
* A struct used to keep iterator state. Users should only access
* `current_value` and `has_value`, the rest of the type should be treated as
* opaque.
*/
typedef struct roaring_uint32_iterator_s {
const roaring_bitmap_t *parent; // Owner
const ROARING_CONTAINER_T *container; // Current container
uint8_t typecode; // Typecode of current container
int32_t container_index; // Current container index
uint32_t highbits; // High 16 bits of the current value
roaring_container_iterator_t container_it;
uint32_t current_value;
bool has_value;
} roaring_uint32_iterator_t;
Its not really capturing the value I suppose, since it could be modified if not called with const-- its more just getting direct access to it by returning a reference or pointer to the address ( uint_32& or uint32_t*) of an instance of current_value which could be stored and accessed freely
@nardyy01 Thanks. This is much clearer.
@SLieve What do you think? I submit to you that we are probably unlikely to change this struct in the future.
FYI Issue #669 will almost certainly need this, as creating c++ iterators will need to be able to provide operator*() and operator->() conferencing operators to be able to complete the iterators. We have these iterators in our c++ wrapper for 32bit and are also trying to make them for 64bit.
It is just a matter of copying and pasting code, but I'd really like to get @SLieve's ok before doing so.
I have gotten very distracted and haven't pushed my implementation for #669 but the way it works is that I create a local struct that contained enough info, and the proper operators, to allow it to appear as an lvalue and update the bitmap. Then the iterator worked with that. I don't remember needing to expose any internals. Although I'm not sure I got the 64bit version working completely. Of course, it's not a requirement to make internals fully public, in C++, to allow an iterator special access. It's very common to use friend or similar.
@madscientist
Of course, it's not a requirement to make internals fully public, in C++, to allow an iterator special access. It's very common to use friend or similar.
In C, we can document that only some fields should be accessed. I don't think that it is a critical issue.
@SLieve's point seems different: he does not want to expose the definition. This is sensible, but we can revisit this choice.
I'm still not quite getting the use-case here. You'd like to get a reference or pointer to the current iterator value? If that's the case, I have two thoughts:
- The current iterator value does not point to the value in the bitmap, so modifying the iterator value does not modify the bitmap.
- Passing an integer by reference or pointer has no advantage in terms of efficiency over passing it by value.
@SLieve Hmm since the struct is using a copy, I suppose these are valid points. I have modified the code to pass by value and it seems to be handling similarly. I will update if any performance differences are noticed.