SOMns icon indicating copy to clipboard operation
SOMns copied to clipboard

Remove Bit Operations from Number Types and Introduce BitArray

Open smarr opened this issue 6 years ago • 5 comments

I am unhappy with the fact that we have bit operations and cast to 32bit or so on number types.

In the presence of arbitrary length integers, these operations are problematic. On the one hand, they reveal the internal representation (two-complement). But perhaps more important, their semantics are problematic with respect to the arbitrary length. For most algorithms that use bit operations, we want fixed-length bit representations.

Examples for bit vectors, bit sets, can be found for instance in Common Lisp, Java, Scala.

I am wondering whether 32bit-wide bit arrays are sufficient for many of the things we want to do (i.e., the benchmarks we care about (there was a new one for F/J which I could not yet implement). If that's the case, we could map them directly to Java's int, which we currently don't use for anything.

Another question about the design is whether we should treat them as Objects or Values. Do we need/want identity? Is it useful to have a reference to the changing BitArray?

And finally the name:

  • BitArray this is my current choice, it is an array of bits (fixed size), or do we need a BitVector?
  • BitSet as in Java or Scala doesn't make sense to me. It's not a set of bits is it? The Java doc even says it's a vector of bits.
  • BitMap, better than BitSet

Tension between BitArray and BitVector:

  • << operation semantics, wrapping on current size, or extending? >> and what with?
  • we could of course have extending and non-extending (i.e. wrapping) operations
  • however, main use case is for porting algorithms that do word-sized operations

smarr avatar Jan 28 '18 17:01 smarr

Integer types are surprisingly hard, and I'm not sure there's a perfect solution (though some imperfect solutions are even more imperfect than others).

I think it's fine to define << as working on arbitrary length integers, in the sense that they can just widen as needed (which is what https://github.com/softdevteam/yksom/pull/43 and Python do). I also think it's fine to say they just work on machine words, though this is more surprising in a language like SOM that gives the illusion that integers can be arbitrarily big. On that basis, my gut feeling is that the "default" << should widen as needed, and a second method (word_shl or whatever it might be called) should restrict things to a machine word (whatever the word size of the machine is). Or maybe it would be best not to even have << as an operator, and that would enable one to pick a particular semantics and document it in a method name more clearly?

ltratt avatar Sep 22 '19 13:09 ltratt

Yeah, I tend towards not having a << operator. For me << comes with >>, >>>, bitXor:, bitAnd:, as32BitUnsignedValue and as32BitSignedValue. They all fall into the category of "needed for benchmarks that do bit-fiddling". The relevant algorithms rarely care about the actual integer as a number value, but usually just as a bit pattern.

[I am not opposed to widening for yksom.]

smarr avatar Sep 22 '19 13:09 smarr

I am just the implementer, so whatever the language committee decides I will implement :)

ltratt avatar Sep 22 '19 16:09 ltratt

For SOM, your solution seems fine, it's not violating any assumptions as far as I can tell. Perhaps benchmarks may disagree at some point.

Also note this specific discussion here is for SOMns, not SOM (see https://github.com/SOM-st/SOM/issues/21).

smarr avatar Sep 22 '19 17:09 smarr

Mea culpa! Sorry, I didn't realise which repo was which!

ltratt avatar Sep 22 '19 17:09 ltratt