plutus icon indicating copy to clipboard operation
plutus copied to clipboard

BuiltinByteString isn't opaque enough

Open treeowl opened this issue 4 years ago • 27 comments

Currently,

newtype BuiltinByteString = BuiltinByteString ByteString

The problem is that any user-written function that is strict in a BuiltinByteString (because of a bang pattern, seq, a strict fold, etc.) is likely to have the worker/wrapper transformation applied to it, tearing the ByteString into its constituents, which the Plutus compiler can't understand.

I believe the right solution is to make BuiltinByteString a datatype instead of a newtype:

data BuiltinByteString = BuiltinByteString ByteString

GHC might very well unpack the BuiltinByteString, but as long as it doesn't know that functions are strict in the underlying ByteString, it won't unpack that, and it should remain available to the Plutus compiler.

treeowl avatar Nov 05 '21 20:11 treeowl

Aha, thanks, this is useful. I've been suggesting that people use -fno-strictness when WW rips up bytestrings, but this might help us avoid needing that so much.

michaelpj avatar Nov 08 '21 10:11 michaelpj

I've started work on this, but I have a ways to go. To be really confident, I need to call GHC.Exts.lazy on each extracted ByteString.

treeowl avatar Nov 08 '21 13:11 treeowl

To be really confident, I need to call GHC.Exts.lazy on each extracted ByteString.

Hmm, that sounds like it's going to be a pain. Where would we need to put such annotations?

michaelpj avatar Nov 08 '21 14:11 michaelpj

It's not that bad. Just things like

BuiltinByteString (lazy -> bs1) <> BuiltinByteString (lazy -> bs2) = BuiltinByteString $! bs1 <> bs2

-- or

BuiltinByteString bs1 <> BuiltinByteString bs2 = BuiltinByteString $! lazy bs1 <> lazy bs2

I like the first way better because it's harder to make mistakes.

treeowl avatar Nov 08 '21 14:11 treeowl

Hmm okay. In fact the only place we should need to do that is in the "fake" implementations of the builtin operations in Builtins.Internal. Any other code that looks inside BuiltinBytestring simply isn't usable on-chain (and those functions should get replaced anyway - the main point would be to prevent GHC from trying to get clever with them).

michaelpj avatar Nov 08 '21 14:11 michaelpj

I also tried

newtype BuiltinByteString = UnsafeBuiltinByteString Any

approach, and it didn't work as then plutus-tx-compiler found Any instead of Addr#. That would allowed keeping the representation the same.

That said, few comments:


A pattern synonym would make lazy stuff simpler.

data BuiltinByteString = BuiltinByteString' ByteString

pattern BuiltinByteString :: ByteString -> BuiltinByteString
pattern BuiltinByteString x <- (lazyUnwrap -> x)
  where BuiltinByteString x = strictWrap x

lazyUnwrap :: BuiltinByteString -> ByteString
lazyUnwrap (BuiltinByteString' (lazy -> bs) = bs
 
strictWrap :: ByteString -> BuiltinByteString
strictWrap !bs = BuiltinByteString' bs

I noticed that Generic instance is used for NoThunks instance, I don't like that. Writing NoThunks instance manually would be better. (There's not much to write in this case).


While making this mistake, I got errors like:

src/Plutus/ChainIndex/Handlers.hs:140:22: error:
    Not in scope: data constructor ‘BuiltinByteString’
    Perhaps you want to add ‘BuiltinByteString’ to the import list
    in the import of ‘PlutusTx.Builtins.Internal’
    (src/Plutus/ChainIndex/Handlers.hs:63:1-79).
    |
140 | getTxFromTxId (TxId (BuiltinByteString txId)) =
    |     

which makes me uneasy. BuiltinByteString seems to be in "core" datatypes used by something where efficiency probably matters. Having an additional box doesn't feel great. But I don't have a better option, except Backpack to duplicate these types.

That might be very bad if Integer has to be wrapped too for the similar reasons.

phadej avatar Nov 08 '21 18:11 phadej

If I disable a worker wrapper transformation, I get

                         { Data.ByteString.Internal.PS _ [Occ=Dead] _ [Occ=Dead]
                                                       _ [Occ=Dead] _ [Occ=Dead] ->

case matches. It would be an improvement if plutus-tx-plugin would figure out that bindings are unused (there is occurence hint!), so there is no need to compile ByteString data-type, nor its field types.

It would be nice if worker-wrapper could be disabled just for ByteString, but I don't think that GHC has such precise knob.

EDIT: and it seems that in some scenarios disabling worker wrapper transformation actually makes PLC smaller, or the increase in size is small-ish.

phadej avatar Nov 08 '21 18:11 phadej

@phadej, I don't trust that Any technique; if GHC sees that it's coercing X -> Any -> X it'll simplify the coercion away. Using an explicit box and being lazy with its contents should be quite reliable; the outer box can be removed by worker/wrapper. The inner box should only affect off-chain code, and it shouldn't be too bad.

treeowl avatar Nov 08 '21 23:11 treeowl

if GHC sees that it's coercing X -> Any -> X it'll simplify the coercion away.

Isn't that good, kind of what a successful worker-wrapper transformation would do?

But it doesn't matter, as it doesn't work.

phadej avatar Nov 08 '21 23:11 phadej

If I can prevent the ByteString from being torn apart by worker-wrapper, what might I have to change in the Plutus compiler to let it work with either a BuiltinByteString or a ByteString? Or will that be hard? Here's another option that hopefully doesn't have that issue:

data BuiltinByteString
  = BuiltinByteString {-# UNPACK #-} !ByteString
  | Never !Void

To avoid fragility, we'll have to use -fno-worker-wrapper and/or -fno-cpr-anal in the Builtins module, but that's not nearly as bad as what we do now. I'm not 100% sure that the strict Void argument can't let GHC mess with us across module boundaries, but I haven't been able to get it to do so, even with aggressive constructor specialization.

treeowl avatar Nov 11 '21 08:11 treeowl

We moved away from using ByteString precisely to make it harder for people to use functions on ByteStrings that won't work, and then they get horrible errors. We want an opaque type. But we also want to be able to run the functions that get compiled on-chain off-chain as well, which is why we do back it by a real ByteString and provide implementations of the primitives.

I've lost track a little bit, what went wrong with just making it data?

michaelpj avatar Nov 11 '21 10:11 michaelpj

@michaelpj, the tricky thing about just making it data is that we'd then have to be prepared for the possibility that the constructor is unwrapped, and also the possibility that the constructor is not unwrapped. I just don't know how hard that will be to deal with. I'm about a minute from opening a PR that does it the sum type way. It'll be nice to see if that builds properly.

treeowl avatar Nov 11 '21 10:11 treeowl

I also have a version that tries to do the "just change to data" thing, but I am less optimistic that that'll work without some modification to the Plutus compiler. I can upload it anyway if you like.

treeowl avatar Nov 11 '21 10:11 treeowl

https://github.com/input-output-hk/plutus/pull/4393 aimed to fix this but still seeing problem in Vasil version https://github.com/input-output-hk/plutus/commit/a56c96598b4b25c9e28215214d25189331087244

Run cabal build with this broken branch to recreate issue when using !TokenName

See this StackExchange question for the compilation error.

catch-21 avatar Sep 26 '22 10:09 catch-21

Idea: What if we do

-- Maybe better if open? That way GHC might think it might have an instance?
type family Opaque :: Type -> Type where

{-# OPAQUE toOpaque #-}
toOpaque :: a -> Opaque a
toOpaque = unsafeCoerce

{-# OPAQUE fromOpaque #-}
fromOpaque :: Opaque a -> a
fromOpaque = unsafeCoerce

type BuiltinByteString = Opaque ByteString

Would this fix the issue?

L-as avatar Oct 08 '22 10:10 L-as

@L-as

Idea: What if we do

That is more or less what I proposed with Any

(EDIT: you want a newtype wrapper so you can write instances)

phadej avatar Oct 08 '22 10:10 phadej

I don't think we'd run into the same error, because the compiler would see Opaque ByteString instead of BuiltinByteString, which is usable while Any is not.

L-as avatar Oct 08 '22 11:10 L-as

... but you want be able to write any instances for a type family application... you need a (newtype) wrapper.

phadej avatar Oct 08 '22 11:10 phadej

You just do newtype BuiltinByteString = BuiltinByteString (Opaque ByteString) as you noted yourself? I don't see the issue.

L-as avatar Oct 08 '22 11:10 L-as

@L-as and then you essentially get newtype BuiltinByteString = BuiltinByteString Any, though different "Any" for different builtin types. Maybe there's merit in that, but all e.g. @treeowl worries are still present (which I don't understand myself).

phadej avatar Oct 08 '22 11:10 phadej

I caught the same error as described in the StackExchange question already linked above which seems related to this issue because of using exclamation signs (strict declaration) in my types.

I used cabal repl instead of cabal build which works fine with no issues and lets me write the script file. Shouldn't it be the same in sense of strictness?

Used plutus-apps tag "v1.0.0-alpha1"

Just an observation maybe it can help solve this issue.

adrian1-dot avatar Oct 21 '22 10:10 adrian1-dot

I can implement my suggestion if you want @michaelpj

L-as avatar Oct 21 '22 11:10 L-as

Is there a fix or workaround for this issue? I'm utilizing plutus-apps v1.0.0 tag.

Fiftyw3bs avatar Jan 11 '23 16:01 Fiftyw3bs

@michaelpj what is the status of this issue? It seems like it's been bugging people for more than a year now, should we ask someone to take a deep look at it and at least summarize the options that we have?

effectfully avatar Feb 01 '23 22:02 effectfully

Yes, it would be good for someone to take another look. I don't understand why our current solution isn't working and I'm reluctant to do something that makes it harder to work with :|

michaelpj avatar Feb 02 '23 12:02 michaelpj

@michaelpj

Yes, it would be good for someone to take another look. I don't understand why our current solution isn't working and I'm reluctant to do something that makes it harder to work with :|

I've created a Jira ticket for that (PLT-5534).

@Fiftyw3bs

Is there a fix or workaround for this issue? I'm utilizing plutus-apps v1.0.0 tag.

Do you happen to have an MRE? I see that we have one linked from Stack Exchange, but not entirely sure if it's still reproduces the problem.

effectfully avatar Apr 10 '23 14:04 effectfully

At the very least we should replace all the NOINLINE pragmas in the Plutus Tx builtins code with OPAQUE ones as that is a way more reliable way to prevent GHC from exposing the internals of a definition.

effectfully avatar Jun 18 '24 01:06 effectfully