endo icon indicating copy to clipboard operation
endo copied to clipboard

fix(pass-style)!: only well-formed strings are passable

Open erights opened this issue 1 year ago • 15 comments

closes: #1739 refs: https://github.com/ocapn/ocapn/issues/47 https://github.com/tc39/proposal-is-usv-string

Description

At an OCapN meeting, we resolved https://github.com/ocapn/ocapn/issues/47 by deciding that only well-formed Unicode strings may be passed. A well-formed Unicode string does not contain any unpaired surrogates. It consists only of a sequence of Unicode code points. A conforming OCapN implementation MUST only emit well-formed Unicode strings, and MUST validate that incoming strings are well-formed, and reject those that are not.

Within the Agoric implementation, the way to implement these restrictions is by having passStyleOf(str) only judge well-formed string to be passable, rejecting all other strings as not passable.

If the underlying engine does not yet implement isWellFormed, we fall back to @gibson042 's shim implementation at https://github.com/ocapn/ocapn/issues/47#issuecomment-1908685478 .

Security Considerations

This change improves integrity because it reduces the attack surface. By rejecting incoming strings that are not well-formed, a counterparty cannot use such a string to push internal algorithms into cases they may not have tested.

Scaling Considerations

Before this PR, passStyleOf would judge a string to be Passable based simply on typeof being 'string', which is O(1). With this PR, the check will often be O(n) in the length of the string, depending on whether and how the underlying engine implements isWellFormed. If the underlying engine does not yet implement it, our shim implementation takes O(n). In theory, a builtin implementation might remember that a string is well-formed, enabling an O(1) test, at least after the first time. However, we are not aware of any such engine optimizations.

When the argument to passStyleOf is an object that it judges Passable, passStyleOf memoizes its judgement so it need only make the expensive check once. However, because of the impossibility of having a user-level weak data structure weakly indexed by strings, it is impossible for user-level code to do such memoization for strings. We should measure to see if this is a problem in practice.

Documentation Considerations

The restrictions on Passable strings should be documented.

Testing Considerations

The first time this PR ran through CI, with code enforcing the restriction, but not yet any code to test the restriction, it is unfortunate that CI came up green. This means that before this PR, when we were accepting non-well-formed strings by design, nothing tested that case, at least in a way that caused a test to break because of the added enforcement.

Compatibility Considerations

This change is in theory a compatibility break, in that previously non-well-formed strings were supposed to work. However, aside from possible tests specifically about non-well-formed strings, we do not expect any actual code to break. See @kriskowal 's note below at https://github.com/endojs/endo/pull/2002#issuecomment-1908857790

Upgrade Considerations

  • [ ] Includes *BREAKING*: in the commit message with migration instructions for any breaking change.
  • [x] Updates NEWS.md for user-facing changes.

erights avatar Jan 24 '24 19:01 erights

Under compatibility considerations, this will cause any existing program that is transmitting unpaired surrogates to break. I do not think we need to treat this as a breaking change, but we should note in NEWS.md that we are hoping that nobody is exercising the broken behavior, so that when someone upgrades and if their application breaks, they have a hint about why.

kriskowal avatar Jan 24 '24 20:01 kriskowal

I think this is a good change but it is marked as draft. Is that intentional? This is in my review inbox. Is this ready for review?

kriskowal avatar Jan 24 '24 20:01 kriskowal

I think this is a good change but it is marked as draft. Is that intentional? This is in my review inbox. Is this ready for review?

It lacks only tests, filling out the PR comment and checklist, and NEWS.md text.

I do not think we need to treat this as a breaking change

Should I remove the "!"?

erights avatar Jan 24 '24 20:01 erights

It also lacks adequate comments in the code

erights avatar Jan 24 '24 20:01 erights

"!" removed

erights avatar Jan 24 '24 21:01 erights

Under compatibility considerations, this will cause any existing program that is transmitting unpaired surrogates to break. I do not think we need to treat this as a breaking change,

I removed the "!". Done.

but we should note in NEWS.md that we are hoping that nobody is exercising the broken behavior, so that when someone upgrades and if their application breaks, they have a hint about why.

Done.

Given that I removed the "!", just confirming that I should not Includes *BREAKING*: in the commit message, letting NEWS.md do the work. Yes?

Is this ready for review?

yes

It lacks only tests,

Done

filling out the PR comment and checklist,

Done, with the first box unchecked.

and NEWS.md text.

Done

It also lacks adequate comments in the code

Done

erights avatar Jan 25 '24 00:01 erights

With this PR, the check will often be O(n) in the length of the string

How often? What's the wall-clock impact on the Agoric blockchain? I hope we don't land this until we have pretty clear data on that.

dckc avatar Jan 25 '24 00:01 dckc

I do not think we need to treat this as a breaking change ...

I don't understand why not. It's clearly a change that's observable from clients.

dckc avatar Jan 25 '24 00:01 dckc

I don't understand why not.

@kriskowal , I leave that to you. I'm willing to go back to "!" if you think I should.

erights avatar Jan 25 '24 00:01 erights

How often? What's the wall-clock impact on the Agoric blockchain? I hope we don't land this until we have pretty clear data on that.

While I agree, can someone else take on actually measuring this?

erights avatar Jan 25 '24 00:01 erights

@gibson042 , thanks for measuring that!

erights avatar Jan 25 '24 21:01 erights

I do not think we need to treat this as a breaking change ...

I don't understand why not. It's clearly a change that's observable from clients.

This is a tree falls in the forest situation. Unpaired surrogates are only likely to show up if you’re doing something silly like abusing strings as Uint16Arrays (I admit having seen this before Uint16Array). So, I agree this is observable and I’ve asked around for whether it would actually be observed. If we can guarantee not, it would be nice because bumping the major version increases chances of contracts retaining multiple versions of this package and obliges us to back-port security patches.

I am also fine either way. Short of asking @mhofman to replay the chain with this change, I don’t know a way to definitely prove that this would not break the chain. And, I also don’t know whether that’s germane.

kriskowal avatar Jan 25 '24 23:01 kriskowal

At https://github.com/endojs/endo/pull/2008#pullrequestreview-1844906628 we decided to add the "!" back in to this PR.

Done.

Now I need to add an appropriate *BREAKING*: to the commit message.

erights avatar Jan 26 '24 02:01 erights

I'm interested to learn whether the breaking change judgement is more of a math problem (does there exist a test by which I could observe it?) or a cost-benefit / policy sort of thing.

This is a tree falls in the forest situation. ... I’ve asked around for whether it would actually be observed. If we can guarantee not ...

A guarantee would involve a closed-world assumption, yes? i.e. an assumption that we can somehow find all the clients. Is our working model that we can, in fact do that?

Thought experiment: have clients send a sort of "warrantee registration card" - if you let us know that you depend on our stuff, we'll make every effort to consider your stuff when judging breaking changes etc.

dckc avatar Jan 26 '24 04:01 dckc

likely to be replaced by native code in the relatively near future.

I'd like to see 2 issues:

  1. that tracks integration of s.isWellFormed() in xsnap (i.e. upgrading XS version, adding a test, ...)
  2. that tracks shipping of that version of xsnap on chain, with a dependency so that it's scheduled before metering

Maybe those (or issues that subsume them) exist already?

dckc avatar Jan 26 '24 16:01 dckc

@kriskowal @gibson042 , I hid this new feature behind a feature flag, defaulting to disabled, so we could include it in the upcoming endo release, to start experimenting with in from agoric-sdk. Even though you already approved, it is a big enough change that I'd appreciate it if you could PTAL before I merge. Thanks!

erights avatar Mar 13 '24 20:03 erights