sucds icon indicating copy to clipboard operation
sucds copied to clipboard

CompactVector extends vs from_slice improvement suggestions

Open KonradHoeffner opened this issue 5 months ago • 5 comments

I got a bit tripped up by the differences between the CompactVector extends and from_slice functions:

  • from_slice takes references and returns an error if the values cannot be cast to usize
  • extends takes owned usize values instead, so you cannot extend with a slice without using myslice.copied() (which makes it slower?)

Is it on purpose that one function takes ToPrimitive and the other usize directly? I think it would be more intuitive if both have the same input types. Also, is it possible to ensure on the compiler level that T fits into usize so the Result type is not necessary?

For example can:

  pub fn from_slice<T>(vals: &[T]) -> Result<Self>
    where
        T: ToPrimitive,

be changed to this?

  pub fn from_slice<T>(vals: &[T]) -> Result<Self>
    where
        T: T: Copy + Into<usize>,

KonradHoeffner avatar Jul 21 '25 09:07 KonradHoeffner

I started to try this myself in a fork at https://github.com/KonradHoeffner/sucds but noticed that there are a lot of cascading changes so before I push this any further I would like to know if a pull request addressing this would be interesting for you :-)

First I noticed that ToPrimitive can actually be replaced by Into<usize> + Copy in many places with slight changes, then the compiler actually can ensure it at compile time, which is great.

There is a small ergonomics downside that you cannot use i32 anymore as it doesn't fit into usize (i32 may be negative) but I think that is a reasonable assumption to make and for example for testing with constants easily doable by adding "usize" to the first number in the constant array.

However then I found out that usize does not even implement From because of theoretically possible 16 bit systems, so I started to use u64 instead as this library is intended for 64 bit systems anyways.

But at this point before I do any more I would welcome your input :-)

KonradHoeffner avatar Jul 22 '25 16:07 KonradHoeffner

@KonradHoeffner

Thank you for your suggestion!

I'll check the code on Saturday or Sunday. Please wait a bit.

kampersanda avatar Jul 24 '25 02:07 kampersanda

@KonradHoeffner

I checked the code and agree with your suggestion.

the compiler actually can ensure it at compile time, which is great.

I completely agree. We should aim to minimize runtime errors wherever possible. Additionally, delegating straightforward operations like type conversion to outside the function can lead to a simpler and more robust function design.

However then I found out that usize does not even implement From because of theoretically possible 16 bit systems, so I started to use u64 instead as this library is intended for 64 bit systems anyways.

Do you mean to implement Copy + Into<u64>? Either way, I'm fine with your changes.

kampersanda avatar Jul 26 '25 02:07 kampersanda

Yes, sorry, I mean Copy + Into<u64> but also to do this:

pub struct BitVector {
-    words: Vec<usize>,
+    words: Vec<u64>,

Because on 64bit systems which this library is designed for it's the same but otherwise you cannot convert it directly.

If that is fine with you I would continue the work early in the upcoming week and create a pull request when it's finished!

KonradHoeffner avatar Jul 26 '25 09:07 KonradHoeffner

@KonradHoeffner

(Sorry for the late reply)

LGTM. Since this library is designed for 64-bit systems, there's no real benefit to keeping BitVector generalized with usize. I agree with your suggestion. I'm looking forward to your pull request.

kampersanda avatar Jul 28 '25 07:07 kampersanda