webpki Accept certificates with invalid serial number encodings

We've noticed an issue which we think is due to webpki failing to parse a cert that was rewritten by an SSL inspection system.

I wrote up a PR that exposes the issue, and provides more details: https://github.com/briansmith/webpki/pull/231

See https://github.com/pantsbuild/pants/issues/12000 for context. Note that none of the participants on that issue, including yours truly, are experts at TLS (or Rust). So this is about as far as we were able to get.

Thanks!

May 07 '21 17:05 benjyw

cc @cjntaylor @tdyas

May 07 '21 17:05 benjyw

I think what is happening here is: the serial number is incorrectly encoded, and so is negative. This is illegal -- RFC5280:

   The serial number MUST be a positive integer assigned by the CA to
   each certificate.  It MUST be unique for each certificate issued by a
   given CA (i.e., the issuer name and serial number identify a unique
   certificate).  CAs MUST force the serialNumber to be a non-negative
   integer.

In the case of the test given here, the serial number is -0x3FC1DAC87904ACD759C456F2816694AC. It's worth noting that if this certificate were issued by a real CA, it wouldn't be allowed by the baseline requirements and would be a misissuance:

CAs SHALL generate non‐sequential Certificate serial numbers greater than zero (0)

Though naturally a private CA is not subject to these requirements.

May 07 '21 18:05 ctz

Related: golang relaxed this in https://github.com/golang/go/commit/a0ea93dea5f5741addc8c96b7ed037d0e359e33f to deal with the same issue -- see eg https://stackoverflow.com/questions/25526870/x509-parsing-error-negative-serial-number-while-pulling-repository and https://github.com/golang/go/issues/8265

May 07 '21 18:05 ctz

Thanks for the quick response! Makes total sense.

Are you amenable to the argument from the golang case that "This buggy software, sadly, appears to be common enough that we should let these errors pass"? :)

May 07 '21 18:05 benjyw

@cjntaylor @benjyw The most correct solution is to fix the broken certificate, assuming the serial number is malformed as was explained to me. Previously people have reported bugs against the webpki project because webpki's parser is really strict and we've been successful in convincing people, e.g. the AWS certificate authority team, to fix the bug on their end. Can you say which buggy software created the malformed certificate?

May 07 '21 19:05 briansmith

According to the PR, the system is Focepoint NGFW Enterprise Firewall (https://www.forcepoint.com/product/ngfw-next-generation-firewall). Let's find a contact with them.

May 07 '21 20:05 briansmith

I agree that we should definitely report this bug to them. The problem is that this is enterprise software. So even if they agree that this needs fixing, we'd be waiting on their release, which could take months, followed by the enterprise IT department agreeing to and then executing on the upgrade, which could take months or even years longer.

On Fri, May 7, 2021 at 1:03 PM Brian Smith @.***> wrote:

According to the PR, the system is Focepoint NGFW Enterprise Firewall ( https://www.forcepoint.com/product/ngfw-next-generation-firewall). Let's find a contact with them.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/briansmith/webpki/issues/232#issuecomment-834740159, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAD5F7HF5PPEB2IV37XMASLTMRBR7ANCNFSM44KT2EBQ .

May 07 '21 20:05 benjyw

My 2 cents: tell users that encounter this problem run with system TLS on their own fork. It will be less uniform across platforms, less reproducible, and subject to a variety of issues that Rustls/webpki doesn't have. This is the line to change:

https://github.com/pantsbuild/pants/blob/af5d5a297ab4f513dd98850d7fab229021d847aa/src/rust/engine/Cargo.toml#L126

You'll get whatever version of OpenSSL is on a Linux system, for instance.

May 07 '21 21:05 sayrer

I agree that we should definitely report this bug to them. The problem is that this is enterprise software. So even if they agree that this needs fixing, we'd be waiting on their release, which could take months, followed by the enterprise IT department agreeing to and then executing on the upgrade, which could take months or even years longer. … On Fri, May 7, 2021 at 1:03 PM Brian Smith @.***> wrote: According to the PR, the system is Focepoint NGFW Enterprise Firewall ( https://www.forcepoint.com/product/ngfw-next-generation-firewall). Let's find a contact with them. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#232 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAD5F7HF5PPEB2IV37XMASLTMRBR7ANCNFSM44KT2EBQ .

@benjyw beat me to it. Forcepoint is managed at an enterprise scale. It's not a tool I *choose* to use - it's forced upon me anytime I make ANY ssl connection inside my company's intranet. Optimistically, if you can get Forcepoint to agree to fix this issue (if they even agree) in 6 months, and I can get a whole different department I'm not even a part of with a whole different release cycle to redeploy what amounts to a critical piece of security infrastructure for more than 8000 staff in 6 months (both of those time periods are incredibly generous, especially the latter with all of the testing and verification that would absolutely have to happen), we're looking at a year, at best, before I could even realistically hope to have this fixed.

In the meantime, I'm dead in the water. I can't do anything that requires using webpki inside our intranet because EVERY SSL connection is rewritten this way. There is no workaround. No mitigation. No alternative. Thats the point. This is security software - it's used to do stateful inspection of SSL traffic to prevent malicious transmission - there can't be any alternative or it won't do what it's intended to do. And believe me, even though I'm not involved in it's management, I'm in touch enough with the folks involved to know that we've had known malicious actors in the network - that's a big part of why it's in place.

I guess I just don't understand why there can't be any flexibility here. I perfectly understand that the RFC makes the negative serial number invalid. I also know that Forcepoint isn't the only tool that generates negative serial numbers, it's incredibly common, especially amongst Microsoft SSL tooling. There is a point at which following the RFC is detrimental to the practical usage of a tool. As it stands, I'll have to put out a warning to our entire staff not to use webpki for any purpose (The Johns Hopkins Applied Physics Laboratory, feel free to look us up). That's what you're choosing here; the grand majority of the remaining ecosystem of SSL systems (based on OpenSSL) are permissive to this RFC violation so you're choosing to stand alone on this issue.

All this said, I don't disagree that we should fix this the right way. All I'm asking for is a bit of consideration. An option? A flag? Some way to run in "slightly less strict mode"? At least during the transition period while we wait what will be years for the right fix to be put in place.

May 07 '21 21:05 cjntaylor

My 2 cents: tell users that encounter this problem run with system TLS on their own fork. It will be less uniform across platforms, less reproducible, and subject to a variety of issues that Rustls/webpki doesn't have. This is the line to change:

https://github.com/pantsbuild/pants/blob/af5d5a297ab4f513dd98850d7fab229021d847aa/src/rust/engine/Cargo.toml#L126

You'll get whatever version of OpenSSL is on a Linux system, for instance.

"tell users that encounter this problem to build, manage and distribute an entire ecosystem of the tool they want to use to fix a non-obvious mostly invented problem of a downstream package that is unique to a second-order transitive dependency of the tool" FTFY

Telling users to build from scratch and manage a complex ecosystem of packages to fix your own issues isn't a solution. Way to kick the can. Thanks. I hate it.

May 07 '21 21:05 cjntaylor

Hey, I think our messages overlapped, and I didn't realize you were so frustrated.

That said, all Cargo packages are built from scratch, and I think you would just need to maintain a patch file (either in Cargo or external to it) to remove the "rustls-tls" feature from reqwest. That approach has the downsides I mentioned, but I believe it would work.

May 07 '21 21:05 sayrer

@cjntaylor In case it isn't clear, I didn't close this issue. It's still open. So, implementing a workaround hasn't been rejected.

May 07 '21 21:05 briansmith

Looking at the golang PR https://github.com/golang/go/issues/8265, they note that RFC 5280 states in Section 4.1.2.2:

The serial number MUST be a positive integer assigned by the CA to each certificate. It MUST be unique for each certificate issued by a given CA (i.e., the issuer name and serial number identify a unique certificate). CAs MUST force the serialNumber to be a non-negative integer.

Given the uniqueness requirements above, serial numbers can be expected to contain long integers. Certificate users MUST be able to handle serialNumber values up to 20 octets. Conforming CAs MUST NOT use serialNumber values longer than 20 octets.

Note: Non-conforming CAs may issue certificates with serial numbers that are negative or zero. Certificate users SHOULD be prepared to gracefully handle such certificates.

The last paragraph suggests providing a way to deal with the gracefully would be a good option to have.

To that point, would webpki be willing to either:

Expose an opt-in config option that would allow callers to relax the strictness requirement around positive serial numbers?
Is there a way to delegate that decision to the caller via a trait defined in webpki?

May 07 '21 21:05 tdyas

Hey, I think our messages overlapped, and I didn't realize you were so frustrated.

That said, all Cargo packages are built from scratch, and I think you would just need to maintain a patch file (either in Cargo or external to it) to remove the "rustls-tls" feature from reqwest. That approach has the downsides I mentioned, but I believe it would work.

Apologies, they did a bit, and I'll rein it in a bit. It's a little more complicated than that with pants given how it's built and maintained, and would put a great deal of burden on anyone wanting to use the fork. Yes, that would work - it's just the principle of the thing. Of course I can fix this problem by removing the dependency on webpki. But isn't that "curing the disease by killing the patient"?

May 07 '21 21:05 cjntaylor

As relevant background, Pants uses Rust internally by embedding a shared object file into a Python wheel. This Python wheel is then installed into a Python virtual environment that is then used to run Pants as the build tool for a user's repository. Given this distribution model, it is a hard ask to have users build custom versions of Pants.

May 07 '21 21:05 tdyas

That said, Pants has had to fork certain crates in the past and could do so here, but I would rather find an alternative before taking on that maintenance burden.

May 07 '21 21:05 tdyas

We don't need more advocacy discussion for or against here.

May 07 '21 21:05 briansmith

I don't think it makes sense to add a "strict mode" vs "non-strict" mode flag for this. It's more work, more code, but the strict mode doesn't seem like more security, so it seems like a net loss. If the default were strict mode then every user that would run into such issues would have to do work to cope with it. So I think the main choice in the short term is to be lenient always or strict always. I am open to the possibility of matching what Chromium and Firefox do, since generally when we compromise on strictness that's the upper limit on our compromise.

I am working on some testing infrastructure that will make it easier to accept a PR that would implement a workaround for this. That will need to be done first.

May 07 '21 22:05 briansmith

@briansmith Thanks! Appreciate the effort and the flexibility.

May 07 '21 22:05 benjyw

I think what is happening here is: the serial number is incorrectly encoded

Just to confirm this: I cleared the sign bit in the serial number and webpki was able to parse it. This is likely the only problem with this cert rather than the tip of an unpleasant iceberg.

May 08 '21 14:05 ctz

Just to confirm this: I cleared the sign bit in the serial number and webpki was able to parse it. This is likely the only problem with this cert rather than the tip of an unpleasant iceberg.

That's a relief!

May 11 '21 04:05 benjyw

Hi, just checking in to see what it would take to get this resolved, and how I/we might be able to help. There was mention of needing some testing infrastructure , not sure what the status is on that. Thanks!

May 21 '21 06:05 benjyw

Hi, gentle reminder that we'd love to see this resolved and are happy to help make it happen. Thanks!

Jun 07 '21 12:06 benjyw

An ASN.1 integer that has its highest bit set is a negative value. Serial numbers cannot be negative. Serial numbers are used at least for revocation (CRL, OCSP) and duplicate detection and maybe other things. We don't implement revocation checks yet but that is planned. With that in mind, we need to have a clear understanding of what to do when the serial number appears to be negative. Here's my proposal:

Do not consider negative serial numbers to be a thing. Instead, assume that if a serial number is negative according to ASN.1 encoding rules, that really the producer intended it to be positive, but forgot a leading 0x00. That is, we would only deal with positive serial numbers. We would continue to reject the serial number 0. Our parser for serial numbers would consider the leading 0x00 to be optional if and only if the next byte is 0x80 or higher. If the leading byte is zero and the next byte is not 0x80 or higher then we would reject the certificate as we do today.
If/when in the future we expose an accessor for serial numbers, we should expose the serial number as a type SerialNumber. The SerialNumber type reference/contain the serial number with any optional leading 0x00 already stripped. Then PartialEq and Eq can be derived.
If/when we implement revocation mechanisms or any other serial number parsing, we'll need to use this special serial number parser and do equality tests using the PartialEq implementation of SerialNumber. In particular, it is important that we consider for revocation checking that one serial number encoded with the leading 0x00 and the other encoded without the leading 0x00 to be equal so that one can use an OCSP/CRL response with correctly-encoded serial numbers to revoke an incorrectly-encoded certificate.
RFC 5280 says that the maximum length of a serial number is 20 octets. Different people have interpreted this to include or exclude leading zeros. We would (continue to) interpret this as disregarding any leading zero.

The above proposal assumes that its serial number comparison rules are sufficient to make it impossible for an attacker to confuse us, so that there would be no security value in a "strict" mode that rejects serial numbers that are, according to standard ASN.1 encoding rules, negative.

In order to implement this, we would start by copy-pasting ring's io::der::positive_integer (and nonnegative_integer) and its tests into webpki in an initial PR. Then we'd follow that PR with one that renames positive_integer to serial_number and implements the rules above, modifying the tests accordingly. I will work to get the additional testing framework into webpki ahead of time so that the tests for the latter PR would also include tests of validation of complete certificate chains containing certs with negative (acording to ASN.1 rules) and zero serial numbers.

WDYT?

Jun 16 '21 19:06 briansmith

@briansmith I've read your proposal and I generally like the idea of supporting negative serials, even though the spec forbids them. But I'm a bit critical of the proposal to strip the sign bit.

How does your proposal try to issue OCSP requests? Will it now issue two OCSP requests, one with a stripped leading 0 and one with a leading 0 added? If the webpki internal representation of the serial has all leading 0s stripped, does this mean that webpki will have to always issue two OCSP requests, even when the certificate has originally been encoded validly, because webpki can't distinguish between validly and invalidly encoded certificates?

If, as hypothesized in your proposal, webpki ever gains an accessor and exposes the serial with the sign bit stripped, tools that use this data might create inconsistent output. E.g. think of a CA that uses two different systems for issuance and CRL creation of certificates. The issuance side has the bug that it creates negative serial certificates. This appears to happen in the real world. The CRL creation side then uses webpki to parse an incoming list of certificates the module should integrate into the CRL. Then it uses that parsed result to integrate the serial into the CRL. The CRL would now contain a different serial than originally intended. I think your proposal tries to handle this case, but it can also partake in creating this case.

Personally I feel that webpki should preserve the raw bytes used to encode the serial, including whether it contains a leading zero or not. The case where CAs output different signs for their CRLs and for their certificates should be rare to non existent, and especially I think one can't blame webpki if something goes wrong if it opaquely handles the serial. If one wants to be safe, one can have OCSP handling, CRL checking etc all fail if it's enabled and a negative serial is encountered.

Jun 16 '21 22:06 est31

@est31 wrote:

But I'm a bit critical of the proposal to strip the sign bit.

I'm not proposing the strip the sign bit. Instead, I'm proposing to, effectively, insert the missing 0x00 and continue on as if it were there.

For example, if the value is encoded 0xff, then that is a negative value according to ASN.1 syntax. But, instead, for serial numbers only, my proposal is to pretend it was [0x00, 0xff], which would normally parse to a serial number value of [0xff] (255).

Jun 16 '21 23:06 briansmith

@briansmith yeah I understand, and I read up extra on how integer DER/BER encoding works (two's complement) before I wrote the comment you replied to. My argument still holds if you substitute "stripping the sign bit" with "adding a leading 0x00", this was mainly just a figure of speech.

I think webpki shouln't try to improve the data here, but instead just either fully support negative serials, or fully reject them. Not try to do a smart trick during parsing.

For transparency: just recently I got a bug repoort in rcgen where it couldn't (validly) sign some certs... as it turns out, rcgen's internal data representation is not powerful enough (yet) to fully preserve the subject name. It turned a printable into a utf8 string, invalidating the subject name: https://github.com/est31/rcgen/issues/59 . The same issue applies here.

Jun 16 '21 23:06 est31

I think webpki shouln't try to improve the data here, but instead just either fully support negative serials, or fully reject them. Not try to do a smart trick during parsing.

If an entry in a CRL (a revoked one) is encoded correctly with the leading zero, but the certificate isn't, or vice versa, then we should consider the certificate to be revoked. If we required the encoding in the CRL to also be a negative number then we'd be potentially forcing the CRL software maker to have to add buggy behavior in order to revoke a certificate created by a buggy serializer, which seems like a non-starter to me, especially because certificates are revoked in emergency situations.

In other words, I think it is a hard requirement that a syntactically-correct CRL should be able to revoke a certificate with a syntactically-incorrect serial number, for any form of serial number we support.

Jun 17 '21 00:06 briansmith

For transparency: just recently I got a bug repoort in rcgen where it couldn't (validly) sign some certs... as it turns out, rcgen's internal data representation is not powerful enough (yet) to fully preserve the subject name. It turned a printable into a utf8 string, invalidating the subject name: est31/rcgen#59 . The same issue applies here.

The rcgen issue is quite different, I think, for reasons that are specific to how name encodings need to be handled in certificates.

Jun 17 '21 00:06 briansmith

How does your proposal try to issue OCSP requests? Will it now issue two OCSP requests, one with a stripped leading 0 and one with a leading 0 added?

OK, I see now why you are concerned about my proposal. Yes, I guess we'd have to request both if we were to ever implement OCSP fetching. I expect that we'll likely only have built-in support for OCSP stapling, not fetching, though.

Jun 17 '21 00:06 briansmith

webpki webpki copied to clipboard

Accept certificates with invalid serial number encodings

webpki
webpki copied to clipboard