nips icon indicating copy to clipboard operation
nips copied to clipboard

NIP-93: NSON

Open fiatjaf opened this issue 2 years ago • 24 comments

I am unsure if this is a really dumb idea and I'm missing some obvious issue with it, but anyway.

text: https://github.com/nostr-protocol/nips/blob/nip93-nson/93.md


update (Set 2024): I still think this is a valuable idea worth exploring, but I think specific encoding I used here could be improved.

fiatjaf avatar May 11 '23 00:05 fiatjaf

I like this alot, it feels like Flatbuffers, but backward compatible, I wonder what @hoytech thinks.

Maybe to make it even more Flatbuffers like, nson should

  • be at the very start of the JSON string (in case other static length fields were added to Nostr in the future).
  • state the number of bytes of every field instead of number of items, thus decoders can jump to the exact field they need, @hoytech probably can make the case better for Flatbuffers

Nuhvi avatar May 11 '23 08:05 Nuhvi

This introduces quite a lot of attack vectors. I think using binary encoding is still simpler solution: by trying to avoid introduction of "complexity" of binary encoding (which is actually simpler than JSON) you actually add complexity and increase the attack surface.

Just switch to CBOR or anything else out there with TLV support to remain flexible in message structure and plan depreciation of JSON encoding. I do not see how keeping JSON makes protocol simpler to use from a developer perspective. JSON is actually harder to use from my perspective.

dr-orlovsky avatar May 11 '23 10:05 dr-orlovsky

I did not ever claim binary encoding was more complex, in fact this own NIP mentions a binary encoding that is referred to as "simpler", but it turns out that most people are not comfortable with binary stuff and having to learn these things and write a decoder before starting to develop a Nostr app is a barrier probably too high. Also backwards-compatibility is very important, even though you don't care about it at all.

I'm interested in learning what attacks this introduces.

fiatjaf avatar May 11 '23 11:05 fiatjaf

I do not understand why people would need to write binary encoders and decoders if there is a plenty of libraries and implementations for standards like CBOR which will do the job for them

I'm interested in learning what attacks this introduces.

Any increase of program complexity introduces attack vectors, especially the one which creates a lot of new conflicting conditions (what if different form of data will be specified in a wrong way - how the software will behave? most of implementation may do a random memory access, overflows etc).

I am not doing any formal analysis of the attacks possible with the proposal, but taking into an account how many attack vectors exists for JSON (and how it is hard to do a JSON parser in a secure way) I think NSON approach doubles on that. If interested, you may see https://developer.apple.com/videos/play/wwdc2018/222/ which provides good introduction into the question

dr-orlovsky avatar May 11 '23 12:05 dr-orlovsky

ACK. I think @dr-orlovsky has some very real concerns/points. JSON is easier to wrap the head around, but obviously makes great sacrifices on scalability

I think NSON is a nice marginal move in the right direction vs JSON, and can keep the binary encoding convo ongoing

habibitcoin avatar May 11 '23 12:05 habibitcoin

Concept ACK. This looks like a good approach in the right direction.

Did you just come up with this on your own? I tried to read more about "NSON" but I haven't found anything. ChatGPT also doesn't know what NSON is, lol

Prompt: Do you know what NSON is? Answer: I'm sorry, but I couldn't find any specific information about "NSON" in my training data up until September 2021. It's possible that NSON is a relatively new or specialized term that has emerged after my knowledge cutoff. Could you please provide more context or information about NSON so that I can better assist you?

Just switch to CBOR or anything else out there with TLV support to remain flexible in message structure and plan depreciation of JSON encoding. I do not see how keeping JSON makes protocol simpler to use from a developer perspective. JSON is actually harder to use from my perspective.

I never heard of CBOR and I think many other people also didn't. JSON has become the de-facto standard for data exchange between languages, for configs, for APIs (application/json), ... This is just how it is and we have to deal with it. Using something else than JSON necessarily means that there will be less developers working on nostr since it's always a hurdle to learn something new. One if not the most important part of the nostr spec was how easy it was to understand. I agree with @fiatjaf there:

It is clear to me that Nostr wouldn't have had a tenth of the success it had if the original protocol had used a custom binary encoding, or even some standard efficient encoding method like protobufs or cap'n'proto. My opinion is that losing JSON compatibility would basically guarantee the failure of Nostr.

-- https://github.com/nostr-protocol/nips/pull/512#issuecomment-1542368664

ekzyis avatar May 11 '23 15:05 ekzyis

ACK. I think @dr-orlovsky has some very real concerns/points. JSON is easier to wrap the head around, but obviously makes great sacrifices on scalability

Maybe we just need ASICs to decode/encode JSON, lol (not sure if serious)

ekzyis avatar May 11 '23 15:05 ekzyis

In order to keep back-compat, the nson field is not signed, right? This means that relays and other middle-men can alter it. Now we need to worry about how clients handle nson that is inconsistent with the normal event structure. Just off the top of my head: what happens if a decoder trusts the nson's number of tags, and the middle-man reduces the number of tags in the nson by 1: Then the attacker would've successfully "deleted" the final tag from the message.

What are we even trying to solve here though? Yes, JSON parsing is slow, but compared to verifying the signature it's insignificant. Since we're talking about the external format of nostr events, everyone is verifying signatures upon receipt, right? After you've validated it, you're free to store it in whatever efficient format you want for later access. strfry uses flatbuffers, and yes @Nuhvi is right -- take me out for a beer and I'll be happy to evangelise that format to you at length. :)

If we were going to create a new incompatible external format (which, to be clear, I don't think we should do, it would be a disaster for the ecosystem), I would focus on reducing the payload size. I believe the absolute minimal valid nostr event is 342 bytes (pre-compression). Especially since a lot of events are literally like 1-4 bytes of payload (an emoji reaction or whatever), this is a ton of overhead. Most of the suggested binary formats don't even improve this very much, especially when factoring in compression. Things like msgpack and CBOR (which is basically just a rip-off of msgpack that a guy named C. Bormann named after himself) are especially pointless because they are schema-less, which means just pure overhead for a static format like this.

Here's what I would do in a brand-new format:

  • Get rid of id field. It's redundant, and must be re-computed upon receipt in order to validate the signature anyway.
  • Don't use schnorr signatures, but instead use a format where the pubkey can be recovered from the signature. This would let us get rid of pubkey field

These 2 changes would cut message overhead by 40% with no additional processing overhead (again, you are validating sigs on message receipt right?)

Note: I think schnorr signatures in theory do allow some cool batch verification stuff, which might actually be a big win when importing many events in a batch.

hoytech avatar May 11 '23 16:05 hoytech

and the middle-man reduces the number of tags in the nson by 1

This will be spotted once the receiver checks the id. The attacker can already modify the JSON directly, no need to modify the NSON field.

If we were going to create a new incompatible external format

But the purpose of NSON is to be compatible!

I don't think we should do

Me neither!

Especially since a lot of events are literally like 1-4 bytes of payload

I was think shrinking the event size is useless as the .content dominates everything, but I always forget these reactions, which to me are useless and I think any client should never request or create these. But I guess it's fair to say that this kind of event is the actual overhead, not the event format.

you are validating sigs on message receipt right?

Although I think validating everything is the default behavior I think it is not a totally bad approach for clients to assume relays are verifying, then sample event id and signature validation in the background in order to detect if there is some relay relaying fake messages eventually. If that happens the bad relay will be caught and expunged from society.

fiatjaf avatar May 11 '23 20:05 fiatjaf

I always forget these reactions

These are like 50% of all events now!

OK now I understand: You are optimising for the case where the client trusts its relays and isn't validating signatures, and the JSON decoding could plausibly be a bottleneck. Re-reading my message, I kind of came off a condescending jerk about that use-case, sorry about that hehe.

~Actually in this case there's no point in even checking the id either, and obviously the fact that the nson field is unauthenticated doesn't matter~ (edit: I was wrong about this, see my comment below). I think the NSON offset "hints" are a pretty clever idea, although it's yet more event-size overhead for clients that don't use them (which will be most of them, especially at first).

If JSON decode is a bottleneck, TBH I would push the libraries harder. Can you add rapidjson to your benchmarks too? I see there are some go bindings, but they will have some overhead too of course. There is a lot of room for optimising this. Previously I have worked on a code-base that parsed JSON (mostly flat with static fields, similar to nostr) using a custom ragel-based parser that was faster than rapidjson still.

hoytech avatar May 11 '23 22:05 hoytech

ACK in principle.

IMHO JSON is actually harder than binary in non-javascript languages, but we could squabble about that all day so I'm not here to debate it. But getting people and lots and lots of installed code base to all switch to binary is fucking impossible. So no switch to binary. But if we were going to do that, please involve me I have written a lot about what nostr should change if it could start over (in cold storage). And of course include @hoytech. The real world is not where you get to create your idealistic perfect notions of how things should be, it is a messy place where you have to navigate the best imperfect path.

mikedilger avatar May 12 '23 00:05 mikedilger

Concept ACK. Can we agree to merge this only after two relay implementations picked up on it? What does @jb55 say?

Giszmo avatar May 12 '23 00:05 Giszmo

@fiatjaf - I decided to verify your assumption that web clients check the id. If they aren't validating signatures, maybe they don't bother to do this step either. I modified my relay to return events with an incorrect id and tried to load this relay up in various clients:

  • snort: Displays the event
  • iris: Does not display the event
  • coracle: Inconclusive: can't add custom relays?

So at least snort is not bothering to validate the id. In the case of a client that trusts the relay and therefore does not validate signatures, this is actually fine security-wise (the relay could easily recompute the id anyway) and desirable from an efficiency perspective.

However, consider the following case:

  • Honest relay that doesn't know anything about the nson field, but simply relays it unchanged to users
  • Client like snort that trusts the relay and therefore does not validate the id, but is upgraded to use the nson field for parsing

A malicious user could take a legit event from a popular user, add or modify the nson field and upload it to the honest relay, which would accept it since the signature is valid. The client would download this event and use the unauthenticated nson to parse the data. The event could then be parsed and displayed incorrectly. For example, the attacker could truncate the beginning/ending of the content, remove tags, etc.

There are several possible mitigations:

  • Require even non-signature validating clients to validate the id, at least when they intend to use NSON (which means they now have to re-encode and hash the event, reducing whatever efficiency benefit they got from NSON).
  • Relays can be required to parse and validate the NSON
  • Alternatively, relays can be required to strip out unknown and therefore unauthenticated fields (strfry has always done this)

Any of the above mitigations could work, but in general I think using unauthenticated data at all is a bad idea. It's a variation of the cryptographic doom principle.

hoytech avatar May 12 '23 03:05 hoytech

Concept ACK. Can we agree to merge this only after two relay implementations picked up on it? What does @jb55 say?

if anything it should just be a different wire format that is just TLVs with null-terminated strings (like flatbuffers but we don't need that level of complexity). I'm not sure I understand the point of putting nson in the json. The biggest slowness in json is dealing with escaping and copying. If it was binary zero-copy over the wire then it would be insanely trivial to decode, and this would be backwards compatible and optional since you could signal it in the websocket request.

jb55 avatar May 12 '23 17:05 jb55

I'm not sure I understand the point of putting nson in the json.

It makes decoding the JSON insanely faster than using normal JSON libraries. Check the benchmarks. That includes copying, unescaping strings and parsing hex.

fiatjaf avatar May 12 '23 18:05 fiatjaf

I'm not sure I understand the point of putting nson in the json.

It makes decoding the JSON insanely faster than using normal JSON libraries. Check the benchmarks. That includes copying, unescaping strings and parsing hex.

I agree with this goal but it's still using JSON. why not just make this an optional negotiated wire format that is pure binary?

jb55 avatar May 12 '23 18:05 jb55

this was discussed 3 weeks ago on nostr as well:

note1rtra39wn6ke985uzsxj5y0xpyfgjtvc2e4g6jg4ghqgctmnj8m3sp3753q

jb55 avatar May 12 '23 19:05 jb55

In other news, this is how nson and my simple binary codec compare against JSON.parse in JavaScript:

2023-05-12-145627_700x542_scrot

Remember that in the Go benchmarks nson runs in 14% of the speed of the standard library, and the binary codec in 4%. In JavaScript it looks like the worse performance is on the binary codec by far.

fiatjaf avatar May 12 '23 20:05 fiatjaf

I put my vote behind this:

Require even non-signature validating clients to validate the id, at least when they intend to use NSON (which means they now have to re-encode and hash the event, reducing whatever efficiency benefit they got from NSON).

That is, in the NSON NIP, it should specify that clients MUST validate the id. This isn't a must if they don't use nson.

mikedilger avatar May 13 '23 03:05 mikedilger

@jb55 I proposed a binary encoding some time ago with #512, but we came to conclusion that it is better to experiment with it in a dedicated protocol. That's why we have started http://github.com/renostr initiative. It will provide:

  • binary encoding
  • end-to-end encryption
  • much faster sigs and SSH/GPG compatible keys with Ed25519
  • combination of both RPC-style of communications with PubSub (over HTTP+WebSocket and TCP)

All of these will be backward-compatible, i.e. it is an extension protocol over nostr. Main goal - help mobiles reduce power consumption, increase speed, better & standardized protection from DoS attacks.

Announcement: https://damus.io/note1d2y8jyrkzu46neuk8806nghx8vdutqznxkmglvqytzh0s7xtrjyq5rlnxa

dr-orlovsky avatar May 13 '23 07:05 dr-orlovsky

why not just make this an optional negotiated wire format that is pure binary?

Because the speed gains of binary over NSON do not justify the added complexity of keeping two formats around forever and the (very high) risk that some people decide to implement just the binary encoding, thus splitting the network.

fiatjaf avatar May 13 '23 10:05 fiatjaf

but it turns out that most people are not comfortable with binary stuff and having to learn these things and write a decoder before starting to develop a Nostr app is a barrier probably too high.

I think writing a custom JSON encoder is most likely a more complex task than concatenating bytes together. One of the problems with Secure Scuttlebutt that I perceived was serious was for example the need to reimplement the complex JSON-based event validation algorithm in every single implementation. This lead to people burning out before doing anything meaningful with the protocol and I know that what we are proposing is nowhere near that complex but concatenating bytes together still seems like an easier alternative.

boreq avatar Dec 19 '23 18:12 boreq

@boreq I know, the only point here is that this is backwards-compatible so you can keep calling JSON.parse() in your code.

fiatjaf avatar Dec 19 '23 18:12 fiatjaf

@fiatjaf this nip example is removed from go-nostr and i think it won't be logical to use this. i think we can close this pr.

kehiy avatar Sep 18 '24 16:09 kehiy