CRoaring icon indicating copy to clipboard operation
CRoaring copied to clipboard

Exposing the roaring64_iterator_t struct in the roaring64 header

Open nardyy01 opened this issue 1 year ago • 12 comments

I wanted to see if there's any chance of moving the roaring64_iterator_s definition from roaring64.c to the header. My reasoning for this is that the original roaring.h had everything exposed which allowed for referencing/dereferencing certain values. I was trying to update some code to allow for similar functionality with the option of using either 32 or 64bits and everything was available except this.

nardyy01 avatar Nov 11 '24 19:11 nardyy01

@nardyy01 could you expand on your use-case? Keeping the definition hidden allows for future changes to the struct without breaking users, so this change would need to have appropriate motivation.

SLieve avatar Nov 11 '24 21:11 SLieve

Would having it in the header here be much different from roaring.h (with warning comments of only accessing value)? I am modifying company code, so I need to use the submodule for proper access to the bitmaps.

As far as use-case, we handle large datasets that are often compressed and stored in multi-dimensional containers. In most areas we are passing around references while processing the data in chunks to keep up the performance -- maintaining access to the reference allows for quick access and reduced calls overall.

nardyy01 avatar Nov 11 '24 22:11 nardyy01

Would having it in the header here be much different from roaring.h (with warning comments of only accessing value)?

Can you be specific? There is no roaring_iterator_t or roaring_iterator_s, right?

As far as use-case, we handle large datasets that are often compressed and stored in multi-dimensional containers. In most areas we are passing around references while processing the data in chunks to keep up the performance -- maintaining access to the reference allows for quick access and reduced calls overall.

So what you want to do it capture the iterator by value rather than as a pointer?

lemire avatar Nov 11 '24 22:11 lemire

This is what it looks like in roaring.h currently -- but for 64 bits the definition is in roaring64.c

Obviously, if you modify the underlying bitmap, the iterator
becomes invalid. So don't.
*/

/**
 * A struct used to keep iterator state. Users should only access
 * `current_value` and `has_value`, the rest of the type should be treated as
 * opaque.
 */
typedef struct roaring_uint32_iterator_s {
    const roaring_bitmap_t *parent;        // Owner
    const ROARING_CONTAINER_T *container;  // Current container
    uint8_t typecode;                      // Typecode of current container
    int32_t container_index;               // Current container index
    uint32_t highbits;                     // High 16 bits of the current value
    roaring_container_iterator_t container_it;

    uint32_t current_value;
    bool has_value;
} roaring_uint32_iterator_t;

nardyy01 avatar Nov 11 '24 23:11 nardyy01

Its not really capturing the value I suppose, since it could be modified if not called with const-- its more just getting direct access to it by returning a reference or pointer to the address ( uint_32& or uint32_t*) of an instance of current_value which could be stored and accessed freely

nardyy01 avatar Nov 12 '24 00:11 nardyy01

@nardyy01 Thanks. This is much clearer.

@SLieve What do you think? I submit to you that we are probably unlikely to change this struct in the future.

lemire avatar Nov 12 '24 01:11 lemire

FYI Issue #669 will almost certainly need this, as creating c++ iterators will need to be able to provide operator*() and operator->() conferencing operators to be able to complete the iterators. We have these iterators in our c++ wrapper for 32bit and are also trying to make them for 64bit.

josh08287 avatar Nov 12 '24 13:11 josh08287

It is just a matter of copying and pasting code, but I'd really like to get @SLieve's ok before doing so.

lemire avatar Nov 12 '24 14:11 lemire

I have gotten very distracted and haven't pushed my implementation for #669 but the way it works is that I create a local struct that contained enough info, and the proper operators, to allow it to appear as an lvalue and update the bitmap. Then the iterator worked with that. I don't remember needing to expose any internals. Although I'm not sure I got the 64bit version working completely. Of course, it's not a requirement to make internals fully public, in C++, to allow an iterator special access. It's very common to use friend or similar.

madscientist avatar Nov 12 '24 19:11 madscientist

@madscientist

Of course, it's not a requirement to make internals fully public, in C++, to allow an iterator special access. It's very common to use friend or similar.

In C, we can document that only some fields should be accessed. I don't think that it is a critical issue.

@SLieve's point seems different: he does not want to expose the definition. This is sensible, but we can revisit this choice.

lemire avatar Nov 12 '24 20:11 lemire

I'm still not quite getting the use-case here. You'd like to get a reference or pointer to the current iterator value? If that's the case, I have two thoughts:

  1. The current iterator value does not point to the value in the bitmap, so modifying the iterator value does not modify the bitmap.
  2. Passing an integer by reference or pointer has no advantage in terms of efficiency over passing it by value.

SLieve avatar Nov 13 '24 19:11 SLieve

@SLieve Hmm since the struct is using a copy, I suppose these are valid points. I have modified the code to pass by value and it seems to be handling similarly. I will update if any performance differences are noticed.

nardyy01 avatar Nov 16 '24 14:11 nardyy01