secp256k1 icon indicating copy to clipboard operation
secp256k1 copied to clipboard

Remove lax DER parsing functions from contrib

Open real-or-random opened this issue 3 years ago • 9 comments

This is an attempt at solving #758 and this is up for discussion.

The underlying issue is that downstream copied the code of ecdsa_signature_parse_der_lax to their files (as initially intended). However later on, their copy and our copy evolved differently apparently without anyone being really aware of this or pointing this out, see https://github.com/bitcoin/bitcoin/pull/19228#issuecomment-641795558 for a detailed overview. This is bad for the obvious reasons that duplicate code should be avoided. Moreover, this turned out to be surprising, at least for me. For example, I was super careful in #578 because I and possibly also some of the reviewers assumed I'm touching (what will become) consensus code but in fact I was not (plus I reinvented some of the changes done already downstream). I can imagine I was not the only one who was confused and I guess the mere existence of the same function defined multiple times in the Bitcoin Core codebase in different ways has confused reviewers in the past because they may have been looking at the wrong code.

My initial plan was to take the downstream changes and apply them here, and then take any remaining differences and apply them upstream to make sure that the copies are in sync. However, I don't think that's the best solution because it doesn't fix the underlying code duplication issue, and it would waste reviewer cycles (for touching consensus code that does not need to be touched).

My second plan was to propose a change to downstream that would simply include our contrib file. But that's arguably against the spirit of the contrib files as @sipa pointed out. These functions rather serve as "documentation". However, if that's the primary purpose, we don't need to keep them here in our repo but we can just refer to their downstream versions instead, in particular because we're a subproject.

So taking all of this into account and after discussion this with @sipa, it seems that the cleanest way is to remove these files and simply point users to the downstream code. This also makes clear who (primarily) owns this code. This PR add the pointers to downstream now in our API help, which is also easier to find than the nowhere-mentioned contrib dir.

One seemingly strange thing about this PR is that it first tidies the DER seckey parsing function a little bit (renaming arguments and improving docs) and also gives the files better names, but then just removes the files. This is for historical reasons: The tidying commits are from my "initial plan". But since I think these changes are real improvements (and should be applied to downstream -- note that this touches only seckey parsing, not sig parsing) I kept them in here for reference. I can squash this of course, if people prefer this.

edit: also note that we explicitly didn't guarantee the existence of these functions https://github.com/bitcoin-core/secp256k1/blob/master/contrib/lax_der_parsing.h#L8

real-or-random avatar Jul 29 '20 10:07 real-or-random

FWIW this will break rust-secp256k1 https://docs.rs/secp256k1/0.17.2/secp256k1/struct.Signature.html#method.from_der_lax (not saying it is a reason to leave it here, just to let people know that this will require us to either remove it from rust-secp or copy the code from bitcoin core there (or reimplement it in rust))

elichai avatar Jul 29 '20 11:07 elichai

I don't think removing it would be doing anything to benefit the world.

Instead, it would make the library less useful to users who need a quasi-DER decoder. Creating a mostly openssl compatible DER-ish decoder took an extraordinary amount of work and is useful to people beyond Bitcoin: essentially everyone working with ECDSA in pre-existing applications needs one. Absent one, they'll cook something up and it's extremely likely that it will be less good than any version of this one.

Now, it's true that they could go fish the code for this out of the bitcoin codebase, except that is a C++ program (and is not currently valid C if copied and pasted out of it-- though at the moment it's not hard to fix) and it is maintained under different and narrower purposes. Unlike this code base, the code in bitcoin is in active use and will continue to get randomly revised according to whatever style trend is currently motivating activity in that codebase.

I am not aware of any differences in the Bitcoin's version which are a material improvement over the current version here. Surely if there are any they're trivial.

What is the problem with just leaving it alone? Is there really something to fix? -- From my perspective this code is done. It was essentially done when it first went in-- though your subsequent changes were fine ones--, and if it never saw another edit in the next 20 years that would probably be just fine.

Sure, other things exist elsewhere-- that's usually the case. In deciding to do something or not minimizing code duplication is probably one of the least important objectives. If it was one of the most important objectives this library wouldn't exist at all-- after all, openssl already existed. Many entire programming languages wouldn't exist, nor would many of the packages people write for them which are just slavish reproductions but in language X.

Perhaps I made a mistake in encouraging your original revision to this code and I apologize if you feel you wasted your time as a result-- but you did such a lovely job with the commit message that I was seduced by by your PR. I also hoped that this repository would be a lot more active than it has turned out to be, and little maintenance changes can be a part of keeping up a good cadence of activity-- productivity begats productivity. As a parsing function it also was an opportunity to solicit involvement from other community members that ordinarily don't review PRs here, as well as to review best practices for pointer handling-- review that I think I personally benefited from--, and I think it was a success from those perspectives.

There are a lot of benefits that come from the execution of good development practices beyond just the code itself, including the community and culture that it fosters, the pride in good craftsmanship, or simply setting things up in a way that are easier and less surprising for users. Part of the reason this repository is separate from Bitcoin is because we expected (and hoped for) other users, sending them back to fish parts out of bitcoin would be a step backwards in that respect. Presumably you wouldn't want this project to also rip out sha256 because its duplicated and they can get it from Bitcoin (and in various platform specific accelerated forms too!).

gmaxwell avatar Jul 29 '20 12:07 gmaxwell

What is the problem with just leaving it alone? Is there really something to fix? -- From my perspective this code is done. It was essentially done when it first went in-- though your subsequent changes were fine ones--, and if it never saw another edit in the next 20 years that would probably be just fine.

I don't think there is a definitive answer to what's best here. Let me summarize the main points again:

I believe everything that we ship (in the repo, as a header, as docs etc.) should either be maintained or removed, and consequently I feel responsible for maintaining this as a maintainer. The consequence is that it's wasted work to maintain the same thing in two places.

For example I consider it a mistake that the contrib files have been overlooked in #701. And this is an interesting example. Apparently noone bothered to point it out there, so either everybody was aware and didn't care even enough to mention it (unlikely), or even the regular contributors here missed it. This demonstrates the status of this code.

Apparently you don't share my belief, at least in the case here. But independently of this, the fact that it has confused people in the past means that it may confuse people in the future. And that's a real problem, even though it may not be a huge one.

real-or-random avatar Jul 29 '20 13:07 real-or-random

If you asked me if I thought #701 should apply to contrib I would have said probably not (among other things "DER encoded private key" is actually what it's called, changing that to secret would make it more consistent with the codebase but less consistent with the world) but if a contributor who was working on it wanted to do it then I would have been willing to review it for them. I don't consider it a mistake to have not changed the contribs, but I also wouldn't consider it a mistake to change the contribs.

There is a lot which is really subjective and I think that when things are actually subjective it's fine do let it get decided by whatever whoever is active feels best about, at least so long as it isn't producing too much distracting churn.

I don't think the library has been particularly well maintained in the last couple years, but I don't think removing the contrib parsers will help that. All I know is that I tried really hard to change things and found myself sending PRs into the void with little activity from other people contributing and that because of the norms established here without people to actively review changes on a prompt basis it was practically impossible to maintain it to any level of standard worthy of respect. And I felt stupid feeling that I maybe should be considering recommending people not use the library because it wasn't being maintained when in theory I could do something about it but felt my efforts to try to do so hadn't been a success.

To the extent that there are actually broken parts that aren't getting maintained then-- sure-- removing them might be the best fate for them. E.g. I think it was good to remove the JNI bindings, which had serious problems and which effort to get someone to clean them up was unsuccessful. Language bindings usually have their own language specific weird issues that make them kind of burdensome regardless.

But some of this is just kind of ludicrous. Wumpus wrote the arm asm years ago and it is a massive performance increase (40%) on devices that critically need more performance. But it's effectively unmaintained: As I just discovered (though apparently it was known) it has build problems on some mainstream compilers, and there are no arm targets in CI at all-- it doesn't seem like the active contributors do much testing locally-- and the issue I opened for that after discovering it instead of being addressed by people who do stuff with travis was detagged as not important for a release.

So should arm support be dropped? I certainly don't think so! Yet it actually does take significant work to maintain, and actually has current problems-- unlike the stuff in contrib. To me worrying about contrib feels about as useful as rearranging the deck chairs on an unsinkable ship that is in the process of sinking. :)

gmaxwell avatar Jul 29 '20 13:07 gmaxwell

But some of this is just kind of ludicrous. Wumpus wrote the arm asm years ago and it is a massive performance increase (40%) on devices that critically need more performance. But it's effectively unmaintained: As I just discovered (though apparently it was known) it has build problems on some mainstream compilers, and there are no arm targets in CI at all-- it doesn't seem like the active contributors do much testing locally-- and the issue I opened for that after discovering it instead of being addressed by people who do stuff with travis was detagged as not important for a release.

So should arm support be dropped? I certainly don't think so! Yet it actually does take significant work to maintain, and actually has current problems-- unlike the stuff in contrib.

I don't think ARM support should be dropped. But yes, it's kind of sad. Apparently there are even bugs in the x86 ASM and noone with the right skill set seems to care (#766). I mean I could learn about GCC ASM first but that's a lot of effort of course.

I think what I'm trying to do is to assess usefulness vs maintenance burden:

  • ARM ASM: very useful, large maintenance burden. -> worth keeping (and this means we should try to get someone who can maintain it.)
  • contrib: not useful at all (because duplicated in Core) and small maintenance burden. -> we can just remove it and the maintenance burden is gone

I know you may assign different usefulness and burden ranks.

To me worrying about contrib feels about as useful as rearranging the deck chairs on an unsinkable ship that is in the process of sinking. :)

I can understand that. Is it then correct to say that you're not opposed to that change but rather indifferent?

real-or-random avatar Aug 07 '20 10:08 real-or-random

It's fine by me. I hope it comes back later in some documentation or example or something like that.

gmaxwell avatar Aug 07 '20 19:08 gmaxwell