plutus Bitwise operations

This is a work-in-progress implementation of this CIP, which ultimately aims to solve #4252. I have left the pre-submit checklist in place (below) to be filled in as this gets finished.

Edit: This is now complete. @michaelpj - requesting a review.

Below is the checklist of tasks that have been done, and those that still remain outstanding:

integerToByteString
- [X] Implementation
- [X] Plutus Core builtin
- [X] Tests for Plutus Core builtin
- [X] PlutusTx primitive
byteStringToInteger
- [X] Implementation
- [X] Plutus Core builtin
- [X] Tests for Plutus Core builtin
- [X] PlutusTx primitive
andByteString
- [X] Implementation
- [X] Plutus Core builtin
- [X] Tests for Plutus Core builtin
- [X] PlutusTx primitive
iorByteString
- [X] Implementation
- [X] Plutus Core builtin
- [X] Tests for Plutus Core builtin
- [X] PlutusTx primitive
xorByteString
- [X] Implementation
- [X] Plutus Core builtin
- [X] Tests for Plutus Core builtin
- [X] PlutusTx primitive
complementByteString
- [X] Implementation
- [X] Plutus Core builtin
- [X] Tests for Plutus Core builtin
- [X] PlutusTx primitive
shiftByteString
- [X] Implementation
- [X] Plutus Core builtin
- [X] Tests for Plutus Core builtin
- [X] PlutusTx primitive
rotateByteString
- [X] Implementation
- [X] Plutus Core builtin
- [X] Tests for Plutus Core builtin
- [X] PlutusTx primitive
popCountByteString
- [X] Implementation
- [X] Plutus Core builtin
- [X] Tests for Plutus Core builtin
- [X] PlutusTx primitive
testBitByteString
- [X] Implementation
- [X] Plutus Core builtin
- [X] Tests for Plutus Core builtin
- [X] PlutusTx primitive
writeBitByteString
- [X] Implementation
- [X] Plutus Core builtin
- [X] Tests for Plutus Core builtin
- [X] PlutusTx primitive
findFirstSetByteString
- [X] Implementation
- [X] Plutus Core builtin
- [X] Tests for Plutus Core builtin
- [X] PlutusTx primitive

Pre-submit checklist:

Branch
- [X] Tests are provided (if possible)
- [X] Commit sequence broadly makes sense
- [X] Key commits have useful messages
- [X] Relevant tickets are mentioned in commit messages
- [X] Formatting, PNG optimization, etc. are updated
PR
- [X] (For external contributions) Corresponding issue exists and is linked in the description
- [X] Targeting master unless this is a cherry-pick backport
- [X] Self-reviewed the diff
- [X] Useful pull request description
- [X] Reviewer requested

Jun 24 '22 01:06 kozross

Let us know when you want reviews, or comment on things that you're unsure about!

Jun 28 '22 09:06 michaelpj

I'm getting line-length related blowups here. I used fix-stylish-haskell, but this doesn't seem to address line length issues. In fact, it seems to completely ignore them in some cases, such as imports.

@michaelpj: Am I missing something here? I've literally has fix-stylish-haskell undo some of my line-length-related fixes in the past, and this never used to fail CI before.

Jul 06 '22 00:07 kozross

I'm also getting utterly bizarre linker errors on MinGW:

/build/ghc124461_0/ghc_1.s: Assembler messages:

/build/ghc124461_0/ghc_1.s:42:0: error:
     Error: CFI instruction used without previous .cfi_startproc
   |
42 |         .cfi_def_cfa rsp, 0
   | ^

/build/ghc124461_0/ghc_1.s:43:0: error:
     Error: CFI instruction used without previous .cfi_startproc
   |
43 |         .cfi_offset rbx, 8192
   | ^

/build/ghc124461_0/ghc_1.s:44:0: error:
     Error: CFI instruction used without previous .cfi_startproc
   |
44 |         .cfi_offset rbp, 8200
   | ^

/build/ghc124461_0/ghc_1.s:45:0: error:
     Error: CFI instruction used without previous .cfi_startproc
   |
45 |         .cfi_offset r12, 8208
   | ^

/build/ghc124461_0/ghc_1.s:46:0: error:
     Error: CFI instruction used without previous .cfi_startproc
   |
46 |         .cfi_offset r13, 8216
   | ^

/build/ghc124461_0/ghc_1.s:47:0: error:
     Error: CFI instruction used without previous .cfi_startproc
   |
47 |         .cfi_offset r14, 8224
   | ^

/build/ghc124461_0/ghc_1.s:48:0: error:
     Error: CFI instruction used without previous .cfi_startproc
   |
48 |         .cfi_offset r15, 8232
   | ^

/build/ghc124461_0/ghc_1.s:49:0: error:
     Error: CFI instruction used without previous .cfi_startproc
   |
49 |         .cfi_offset rip, 8336
   | ^

/build/ghc124461_0/ghc_1.s:50:0: error:
     Error: CFI instruction used without previous .cfi_startproc
   |
50 |         .cfi_escape 0x16, 0x07, 0x04, 0x77, 152, 65
   | ^
`x86_64-w64-mingw32-cc' failed in phase `Assembler'. (Exit code: 1)
make[1]: *** [rts/ghc.mk:325: rts/dist/build/StgCRun.o] Error 1

This also never used to be a problem. @michaelpj - did something change?

Jul 06 '22 00:07 kozross

I'm getting line-length related blowups here.

The line length check is new. We had it in editorconfig before, but now it's enforced. Indeed, stylish-haskell does not fix it, you have to do it yourself.

We're going to try and ratchet towards general conformance, but a lot of files are ignored. I would be fine with you including your files in that, just add -- editorconfig-checker-disable-file to the top of the file.

I'm also getting utterly bizarre linker errors on MinGW

That looks like you're building GHC. Which looks like it's because you've updated both haskell.nix and nixpkgs :o Please don't do that, it's a big deal and will result in GHC rebuilds and the rest of the world. Also they're not guaranteed to go through, especially since you've probably just pulled in the tip of nixpkgs-unstable!

Jul 06 '22 09:07 michaelpj

@michaelpj - the problem with the line length check is that stylish-haskell actively works against me. For example, consider this (overlong) import in Bitwise.hs:

import Data.Bits (FiniteBits, bit, complement, popCount, rotate, shift, shiftL, xor, zeroBits, (.&.), (.|.))

This is over 100 characters wide. If I reformat it to this (which is within an 80-character width limit):

import Data.Bits (
  FiniteBits, bit, complement, popCount, rotate, shift, shiftL, xor, zeroBits, 
  (.&.), (.|.)
  )

When I run fix-stylish-haskell, I get this again:

import Data.Bits (FiniteBits, bit, complement, popCount, rotate, shift, shiftL, xor, zeroBits, (.&.), (.|.))

This is fundamentally a problem with stylish-haskell; not only can it not break overlong lines, it actively works against you if you try. This is one of the (many!) reasons it is a suboptimal code styling tool.

I can certainly ignore the checker as you suggested, but this is a lurking problem that won't go away.

Jul 06 '22 21:07 kozross

Hi, I am excited to see these builtins in plutus. I wondered what the current status is of the path to active on this CIP? Is this on track for the next HF? I see that @kozross checked all the boxes of the implementation, when is the right time to merge this? I assume that the costing still has to be done?

Jan 04 '23 09:01 perturbing

I think we were waiting for the CIP to make progress. Also this is somewhat reliant on having a keen sponsor pushing it through. If Koz/MLabs have lost interest then someone else would need to take it up.

Jan 04 '23 18:01 michaelpj

I think we were waiting for the CIP to make progress. Also this is somewhat reliant on having a keen sponsor pushing it through. If Koz/MLabs have lost interest then someone else would need to take it up.

the sponsor in question is IOG - MLabs performed this work pursuant to IOG funding, we will happily take it up again if IOG deems it worthwhile.

Jan 04 '23 19:01 Benjmhart

CIP merged, I think the only thing remaining is going through the remaining comments by Michael?

Mar 07 '23 15:03 L-as

I just realised that integerToByteString is problematic because you can't control the length of the output. This is important because all operations only work on arguments of equal length. Padding out a bytestring to the correct length is not zero-cost since we need to create a helper bytestring which we append to the output, resulting in 3 allocated bytestrings rather than 1. When doing cryptography using these functions, we'd always be mod-ing all arithmetic operations, hence conversion to and from must be fast. In addition, we would never be using negative numbers, instead representing negative numbers as the upper half of the range we choose, hence sign-extension isn't a worry. If the integer is too big for the chosen length, we cut off the top. We'd still want to support converting an arbitrary-length integer to bytestring, as there isn't a way of calculating the necessary length effectively. This could be done by regarding a chosen length of 0 as "unlimited". I'm not sure if this is a good idea. In any case, I assume this should be added to the CIP?

Mar 10 '23 16:03 L-as

Perhaps the real solution is adding another "extension" operation (that can also shorten), one that sign-extends and one that doesn't.

Mar 10 '23 16:03 L-as

If we switch BuiltinByteString to ShortByteString, conversion from/to integers would be O(1), which would be quite a big efficiency-boost I'd imagine. The only downside is that slicing becomes O(n), but I'm not sure this is an issue, as slicing isn't commonly used anyways. @michaelpj ?

Edit: Not entirely straightfoward since BigNat# has the invariants that: 1) it's word-sized and 2) top-most word is non-zero. If we assume word sizes are 64 bits, we can still take advantage of this though! On 64-bit platforms, integerToByteString can be truly O(1). byteStringToInteger can be too if we err if it's not word-sized. On non-64-bit platforms it gets a bit awkward, but it might still be worth it. This would obviously require changing the specification of the conversion functions to match the invariants of BigNat# on 64-bit platforms.

Edit 2: We don't need to change the specification. BuiltinByteString would be composed of a ByteArray# and an explicit additional length, such that the underlying array always fulfills the invariants of BigNat#. If the explicit length is shorter than the underlying array, then the extra bytes must be zero. If the explicit length is longer than the underlying array, the missing bytes can be assumed to be zero. The real length must always be a multiple of the word size, with a top-most non-zero-word. Hence, the explicit length can not be a full word less than the real length. With this, integerToByteString is truly O(1). The explicit length should be such that the top-most byte is non-zero. You can then slice to N bytes (equivalent to modulus of that power of two), which will create a new BuiltinByteString with the same invariants, an explicit length of N, and a real length of ceil(N / wordsize)*wordsize. Converting this back to an Integer is also O(1), as we can use the underlying array as-is, discarding the explicit length. The downside of slicing being slower still holds, albeit you could work around this by keeping around an explicit offset too, but this would simply delay the copy to the conversion back to an integer, which might still be worth it, but doesn't seem important enough to me.

Mar 10 '23 16:03 L-as

A consequence of the previous scheme is that we can implement an O(1) zero-extension function zeroExtendByteString :: Integer -> BuiltinByteString -> BuiltinByteString that increments the explicit length without touching the underlying array. This might be the only thing we need to add to/change in the specification. Then, if you have x and y, 64-bit signed integers represented as the [0..2^64[ range in Integer, to do an addition, we sum the integers, convert to bytestring (almost no-op), zero-extend to 8 bytes (almost no-op, no-op if already above 8 bytes), AND with 0xFFFFFFFFFFFFFFFF, convert back (almost no-op). This seems reasonable.

An alternative design would be to change slicing such that an out-of-bounds range, rather than being invalid, is filled with zeros. That would also solve the issue with roughly the same performance (you wouldn't need the AND anymore).

Perhaps including both changes would be wise.

The question is, is the overhead of calling a function low or high relative to the cost of modInteger? If high, it might in the end not be faster than just doing modInteger! Of course, future improvements to the Plutus interpreter might change this.

Mar 10 '23 20:03 L-as

Hi, thanks for getting back to this. We haven't forgotten, we just have a bunch on our plate at the moment. I will try and get back to this before too long.

Mar 27 '23 14:03 michaelpj

Once more, apologies for taking so long to deal with this. However, we've now got some spare capacity and we'd like to make an effort to get this merged in the near future. Is this a good time for you to work on this again? We can commit to doing some things, like adding costing for the new functions, and maybe adding the new functions to the Agda metatheory and the Plutus Core specification. FYI, I have a branch where I've updated this to the current state of our main branch and added some of the basic costing functionality.

Jul 20 '23 09:07 kwxm

I'm planning to do a complete review of the code over the next few days, but there are a couple of matters I'd like to raise right now. Firstly, I'm thoroughly confused about endianness. I think that all of the functions that involve positions in bytestrings operate the other way round from what's specified in CIP-0058. Let me spell things out so that we're all on the same page.

Firstly, I'll write the bytestring s given by pack [0x12, 0x34, 0x56] as 0x123456 for brevity (in our concrete syntax that would be (con bytestring #123456)). Our existing builtins like indexByteString and sliceByteString number the bytes from the left, so takeByteString s 0 would return 0x12 = 18. The semantics in the CIP writes bytestrings as a sequence of bytes S = s₀s₁...sₖ and S[i] denotes the byte s_i. The notation for bitstrings is similar. The semantics for eg testByteString do what you'd expect: given a bitstring S as above, testBitByteString S i is 1 if s_i is set and 0 if it's clear: however, the implementation works from the other end: [(builtin indexByteString) (con bytestring #800000) (con integer 0)] returns False but [(builtin indexByteString) (con bytestring #800000) (con integer 23)] returns True. The behaviour of writeBitByteString is similar. I think that the reason for this may be that the the operations in Data.ByteString index bytes from the left but those in Data.Bits index bits from the right, so if you're trying to access the bit at position 8i+j in a bytestring of length n you have to adjust one of the positions: either start at the left and look at bit 7-j in byte i, or start at the right and look at bit j in byte n-1-i, and the latter may be more efficient if you need to look at a lot of bits. I can see arguments for starting the bit numbering at both the left (it fits with the way we index bytes in a bytestring) and the right (it fits with the way that you always index bits in a byte), but whatever we eventually do the implementation and the specification should agree.

The issue of endianness becomes particularly important for conversions between bytestrings and integers. It isn't explicitly stated in the CIP, but it's clear from the semantics that the intention is that the encoding of integers as bytestrings is supposed to be big-endian, with the leftmost bytes in the bytestring being the most significant hexadecimal digits in the corresponding integer (for example, it says that in S is a bytestring and Z is a bytestring consisting entirely of zero bytes then S and Z·S represent the same integers). Again though, the implementation uses a little-endian encoding: byteStringToInteger applied to 0x000001 yields 65536, not 1.

However, I think we need to support both big- and little-endian encodings of integers since both occur in practice, and specifically in cryptography. For example, EdDSA digital signatures use a little-endian encoding for (integers representing) field elements, but RSA , SHA2, and BLS12-381 all use big-endian encodings.

We could supply two encoding and two decoding functions in Plutus Core, but that might make maintenance a bit tricky. A simpler solution would be to implement only one encoding (I'd prefer big-endian since that's arguably more intuitive, but that's just my opinion) and also supply a reverseByteString primitive: little-endian encoding could then be expressed as big-endian encoding followed by reversal, and little-endian decoding would be reversal followed by big-endian decoding (or vice-versa); furthermore the extra reversal steps could easily be hidden in higher-level languages targeting UPLC. A reverseByteString builtin could easily be implemented using Data.ByteString.reverse and would probably be very efficient, so there would be little extra overhead involved in changing endianness. Whatever we do, it would help to avoid confusion if the name(s) of the relevant builtin(s) mentioned the endiannness, although integerToByteStringBigEndian would admittedly be unwieldy.

A second issue (as mentioned above) is that many applications use fixed-length encodings of integers (again this happens a lot in cryptography, and the fact that all of the bitwise operations proposed here require both arguments to be of the same length is relevant as well) but the integerToByteString function implemented here just outputs the smallest bytestring that can be used to represent the input, so for example the integer 0 is encoded as the empty bytestring. It would be possible for users to pad the output to the required width manually, but this would probably be quite expensive. As mentioned previously, this could either be done by supplying integerToByteString with an extra width argument (but then what do you do if you really don't care about the width?) or supplying a built-in padding function. The latter might be easier, although I have some reservations because it would allow you to create massive bytestrings very quickly, which might be problematic from the point of view of costing. We might also need "pad on the left" and "pad on the right" functions, although again having reverseByteString might let us get away with only one.

I note also that the CIP proposes both countLeadingZeroesByteString and countTrailingZeroesByteString, but in fact only findFirstSetByteString is implemented. This should be renamed (or the CIP should be changed), and the other function should also be implemented (although reverseByteString might be helpful here again). [Later] I had a closer look and it turns out that, as the name suggests, the semantics of findFirstSetByteString are different from those specified for the count... functions. Again it starts from the right, and it returns -1 if the input consists entirely of zero bytes; the count... functions return 0 in this case. Returning -1 might lead to an unexpected surprise if you're not careful: it's a pity we don't haveOptional in UPLC. I guess that the choice of which function to implement depends on the intended use cases for this function.

Jul 20 '23 10:07 kwxm

Note that Sundae would benefit dramatically from these bitwise operations; In particular, because transaction inputs get re-ordered, we need to provide an "indexing set" in the redeemer; but because of that, we need to check that each index is distinct, which involves an O(n^2) algorithm, and eats up a huge portion of our execution budget.

With cheap bitwise operators, we could use a bit mask to track indexes we've seen before.

Aug 07 '23 21:08 Quantumplation

@Quantumplation do you have a sketch of the code you would use? It's very important for us to actually try this out and make sure that the budget improvements do materialize!

Aug 08 '23 09:08 michaelpj

@michaelpj I can provide some psuedocode for what i'm doing now, how many execution units that costs, and some psuedocode for how I was thinking of using bitwise operators, but anything higher fidelity will have to wait until I have some spare time;

Here's what I'm currently doing, more or less:

pub fn has_in_first_n(items: List<Int>, elem: Int, idx: Int) -> Bool {
  if idx == 0 {
    False
  } else {
    when items is {
      [] -> False,
      [head, ..rest] -> {
        if head == elem {
          True
        } else if idx == 1 {
          False
        } else {
          has_in_first_n(items, elem, idx - 1)
        }
      }
    }
  }
}

pub fn get_sorted_orders(
  indexing_set: List<Int>,
  inputs: List<Input>,
) -> List<Output> {
  list.indexed_map(
    indexing_set,
    fn(idx, elem) {
      // Make sure the elements of the indexing set aren't repeated
      // by checking for each element in all the elements leading up to it
      expect !has_in_first_n(indexing_set, elem, idx)
      // Index into this list; in the real implementation I use an
      // unsafe_fast_index_skip that indescriminately skips in large chunks
      // before actually bounds-checking the last few steps 
      let input = list.at(inputs, elem)
      input.output
    },
  )
}

In a particular benchmark, the above takes a total of 3,603,290 mem and 1,431,345,727 steps.

If I comment out the expect !has_in_first_n, this drops to 876,338 mem and 336,497,835 steps, saving ~2.8m mem and 1.1b steps;

If I then change it to just loop over the list and return each input (to loosely measure the unsafe_fast_index_skip), it drops to 157,418 mem and 55,464,915 steps, roughly 700k mem and 280m steps.

It's worth noting, I am always bottlenecked on memory, and it's not even close.

Here's what I was thinking I could try with bitwise operators:

pub fn do_get_sorted_orders(
  indexing_set: List<Int>,
  inputs: List<Input>,
  seen_indexes: ByteArray,
) -> List<Output> {
  let [head, ..tail] = indexing_set
  let idx_bit = (1 << head)
  expect seen_indexes && idx_bit == 0
  [list.at(inputs, head), ..do_get_sorted_orders(tail, inputs, seen_indexes || idx_bit)]
}

pub fn get_sorted_orders(
  indexing_set: List<Int>,
  inputs: List<Input>,
) -> List<Output> {
  do_get_sorted_orders(indexing_set, inputs, 0)
}

Or something similar, depending on how the bitwise operators work. indexing_set is, right now, at most 19 orders, with the hope to push it further, but likely never hundreds or thousands of entries.

Aug 08 '23 09:08 Quantumplation

Hopefully the above is at least somewhat helpful to you :sweat_smile:

On an unrelated note, it's really the O(n) list indexing that is absolutely wrecking performance everywhere. If we had O(1) list indexing, or dictionaries, it feels like it would open up entirely new possibilities for us.

Aug 08 '23 09:08 Quantumplation

@kozross I've been doing some preliminary costing experiments with this branch by benchmarking the Plutus Core versions of the bitwise functions with inputs of varying lengths. Most of them seem to behave as you'd expect (execution time is approximately linear (or constant) in the length of the bytestring), but the results for integerToByteString and popCountByteString are a bit weird. Here's what the times look like for integerToByteString:

integerToByteString

The time increases linearly up to length 3247 (bytes) and then drops to a constant. I could believe that it might be constant because the layout of the bytes doesn't change when you convert to a bytestring (I tried looking at the CHG.Integersource, but wasn't able to confirm that), but why would it take longer for smaller inputs? The byteStringToInteger function (with the same data (or rather the translation of the same data)) doesn't behave like this: the graph is pretty much a straight line with a few small discontinuities.

For popCountByteString there's a sudden change from one linear function to another:

popCountByteString The jump is just after 7870 bytes.

I've produced similar results on three different computers and with different inputs, and I'm pretty sure the benchmarks aren't doing anything stupid (although I'll check again). Do you have any insight into what might be going on?

Aug 24 '23 08:08 kwxm

The figures above were for the time taken to evaluate a Plutus Core term applying the builtin to a bytestring. I ran the benchmarks again and then benchmarked the Haskell function Bitwise.popCountByteString directly, and I still got a big discontinuity, although in a slightly different place. The blue dots below are times for the Plutus Core version, the red dots for the Haskell version. popCountByteString-2

Aug 24 '23 08:08 kwxm

It's similar for integerToByteString. I'm not sure why the PLC version takes so much longer than the Haskell version (apparently about 5-30x longer).

integerToByteString-2

Aug 24 '23 11:08 kwxm

@kwxm - I have a few suspicions. 8KiB is the page size: this means you're paying an extra penalty on paging, which may explain that discontinuity. The conversion difference is much harder to explain: I'd have to check what I wrote originally.

Aug 24 '23 19:08 kozross

Ok, thanks. I'll check my benchmarks again to make sure that I've not made some silly mistake. We haven't seen anything like this for any of the existing builtins, but I'll re-benchmark some of them too in case some change (in the GHC runtime for example) has introduced some new behaviour.

Aug 24 '23 20:08 kwxm

@kwxm - do the benches I wrote for those operations reproduce this behaviour? I haven't run them in a while, but I don't recall seeing such jumps. Maybe I just didn't test with large enough values?

Aug 24 '23 21:08 kozross

@kwxm - do the benches I wrote for those operations reproduce this behaviour? I haven't run them in a while, but I don't recall seeing such jumps. Maybe I just didn't test with large enough values?

It looks like they do, for popCountByteString at least (I don't think there are any benchmarks involving the eventual implementation of integerToByteString. I changed the sizes of the inputs to [10,20..1500] and there's a discontinuity for the "block C" implementation: see the graph ("block C" is red, "block unrolled C" is blue; the other two are pretty much straight lines, but they're much steeper so I left them out). popcount-3

Aug 25 '23 03:08 kwxm

Definitely something I'd use - I use bitwise operators all the time in javascript and other languages. One particular use case that seems relevant - for my recent Smart Life NFT project I am generating Wolfram's rule numbers for 1-d cellular automata by directly converting between 8 individual boolean fields into an 8 bit int which represents the rule number.

Aug 27 '23 17:08 kieransimkin

I now think the weirdness in the execution time for integerToByteString isn't so bad. It is a bit strange, but in fact this function is extremely fast (I haven't checked the details, but I think it may not actually have to do much work, perhaps just casting the type of a string of bytes) and a little irregularity in the cost may not be a big problem. It would be good to understand what's going though in case something changes in the future.

Sep 10 '23 12:09 kwxm

I've added costing code (and a few other things) in this branch and updated the cost model so that it assigns a non-zero cost to the bitwise operations. We're in the process of transitioning to a new benchmarking machine so the bitwise costs won't be totally consistent with the costs for existing operations; however, it'll at least give us ballpark estimates of costs for programs involving the bitwise operations.

Sep 10 '23 12:09 kwxm

plutus plutus copied to clipboard

Bitwise operations

plutus
plutus copied to clipboard