cardano-base
cardano-base copied to clipboard
Clear away psbFromBytes
As observed here, this function has some severe issues with referential transparency, and is thus generally not safe. While this PR deprecates both psbFromBytes (and psbZero) using a DEPRECATED pragma, as well as removing its uses internal to cardano-base itself, this function is still used internally to cardano-crypto-class:
psbFromByteStringIsString (PinnedSizedBytes n)
While these should both be modified to not use psbFromBytes, the IsString instance is rather horrifying: much like the same instance for ByteString, is hugely problematic for mostly the same reasons. I would propose removing this IsString instance wholesale, and replacing it with a quasi-quoter.
IsString (PinnedSizedBytes n)
Holy :hankey:
I would propose removing this IsString instance wholesale, and replacing it with a quasi-quoter.
This is a very good suggestion!
I have something of a personal usability stake in this, as a task I will have soon will be adding golden tests to ensure that the SECP256k1 serializations are behaving like we expect on a fixed set of examples. Having proper literal syntax (which checks my mistakes and doesn't do things behind my back) is definitely something I would want!
Furthermore, there's also a huge problem with the IsString instance: it produces _un_pinned data. Not only is this super counter-intuitive (the type is named _Pinned_ByteString after all), it's also dangerous, as it means I can't pass such literals to the FFI and expect them to behave themselves.
There are more functions in the PinnedSizedBytes module that have referential transparency issues:
psbFromByteString(implemented in terms ofpsbFromBytes)psbFromByteStringCheck(implemented separately, for some reason, but also has the potential to create surprising sharing/non-sharing behavior the same waypsbFromBytesdoes)pinnedByteArrayFromListN(though this one is not exposed, only used inside the module)
In a nutshell, if we want PinnedSizedBytes to be referentially safe in the face of mutations, then all construction and copying must happen in IO (or ST). psbCreate and friends already observe this restriction, so it's actually a bit strange that the offending ones don't.
So IMO the sane solution would be to:
- Change the offending functions to return
IO (PinnedSizedBytes n)instead ofPinnedSizedBytes, and eliminateunsafe{Dupable}PerformIOfrom their implementations - Remove the
IsStringinstance, and provide an alternative (psbFromString) implemented inIO, in terms of one of the fixedpsbFrom...functions - Hunt down all usages of these functions and change the call sites to deal with the
IO.
If you then still feel the need for a quasiquoter, it can be implemented fairly easily in terms of the new psbFromString function. It would largely be a cosmetic thing, except that you could add compile-time length checks (which isn't possible with plain string literals).
@tdammers - I mention psbFromByteString as a problem in the initial post, so 100% agreed.
psbFromByteStringCheck is actually implemented by making a copy, which should avoid the sharing behaviour. Is there something I'm missing here? The same applies to pinnedByteArrayFromListN as far as I can tell.
I'm not sure changing all of these to IO is necessarily what we want: we can still be safe as the API currently stands as long as we copy each time. psbFromByteString isn't even hard to fix in this regard, and the others, as far as I can tell, are not an issue unless I'm missing something. I'm strongly in favour of a quasiquoter, because checks on literal form (including length) are very desirable for constants in my opinion.
@lehins - thoughts?
I do agree with @kozross , there are no problems with psbFromByteStringCheck.
I do want to point out that the description of this ticket is a little misleading psbFromBytes, psbFromByteString and pinnedByteArrayFromListN have no problems with referential transparency. It is only when the produced frozen buffer is mutated, that is when the referential transparency is violated, which is exactly the expected behavior of frozen arrays. The problem comes from the fact that the produced PinnedSizedBytes were used incorrectly, i.e. unsafeThaw (or in our cases simply casted to as Ptr and mutated) is what the actual cause of the problem. As @kozross well pointed out, this is the correct solution:
we can still be safe as the API currently stands as long as we copy each time.
In fact the whole PinnedSizedBytes API is unsafe because it does not make the distinction between mutable and frozen state and allows for the buffer to be treated arbitrarily and it is left up to the user to be very careful about what is safe and unsafe. The only defense for such API I have is that all operations have to be done on Ptrs due to FFI, while preserving purity and referential transparency.
So, @tdammers switching all to IO will not help, in fact it will make things a lot worse, because all of the functions in the cardano-crypto-class library are used from pure non-IO code, which would lead for those unsafePerformIOs to travel up into use sites, thus making the codebase a lot more unstable. As far as I can tell current uses of unsafe{Dupable}PerformIO are safe, so we should leave them alone, but we could do a bit of work on making the API safer and I do have some ideas on that front.
Right, yes. I think what it boils down to is that we need to decide whether we want to present PinnedSizedBytes with value semantics (immutable) or with mutable-reference semantics (immutable reference to mutable value).
ByteArray already provides the reference semantics, so the right thing to do here is to maintain the pure API, but make it so that in-place mutations operate on copies.