opentitan icon indicating copy to clipboard operation
opentitan copied to clipboard

[rom] Change the endianness of FORS indices for SPHINCS+.

Open jadephilipoom opened this issue 1 year ago • 7 comments

This backwards-incompatible change is part of the changes between the round 3 SPHINCS+ submission to the NIST PQC competition and the SLH-DSA draft standard. Part of https://github.com/lowRISC/opentitan/issues/21944

Corresponds to this commit from the SPHINCS+ reference implementation (consistent-basew branch): https://github.com/sphincs/sphincsplus/commit/129b72c80e122a22a61f71b5d2b042770890ccee

Test vectors were re-generated from the consistent-basew branch of the reference implementation.

~Note that I think this might break the cryptotest SPHINCS+ setup, because they are right now hard-coded to fetch a specific self-hosted file for the round3 submission: https://github.com/lowRISC/opentitan/blob/5dbbd252ce8531680fde18361614ed07c783b5c7/third_party/sphincsplus/repos.bzl#L19~ ~@RyanTorok @milesdai do either of you know if there is another file there, or how we can update it? I don't think NIST is actually hosting a corresponding KAT file for the draft standard yet, so we could also potentially wait until they release final test vectors (but in that case we should probably add a comment that this setup will fail tests on master, and explain why). The round3 tests are still useful for ES sival.~

Edit: now addressed, thanks Ryan!

jadephilipoom avatar May 03 '24 08:05 jadephilipoom

If we find a new archive with KAT tests for the new version, it would be easy to upload it to the cloud storage and use (we would likely upload it alongside the old one, to preserve compatibility for other branches).

The archive file in the GCP storage for SPHINCS+ is a bit special compared to the rest of the cryptotest framework. Due to size considerations, the SPHINCS+ archive is manually created by us (which is why the GCP storage is primary, unlike the other algorithms, which use the NIST site as primary and GCP as backup). "Manually created" here just means we un-compressed the archive, removed the KAT tests for all the variations of SPHINCS+ we don't support, and re-compressed it.

That original zip came from the NIST round 3 submission available directly on the SPHINCS+ site, and did not involve the SPHINCS+ repo at all. I took a look at the new branch matching the new draft standard, and it is possible to generate the test vectors for this version, but this won't be possible hermetically.

We may be able to create a new archive file to store in the GCP using the temporary files generated by vectors.py, but the process to create the archive would be more complicated than simply removing a bunch of irrelevant files, and I'm not sure what implications that has for reproducibility.

RyanTorok avatar May 03 '24 12:05 RyanTorok

I generated a new archive file with KAT tests using the consistent_basew branch, and uploaded it to the GCP bucket. I updated the manifest to include instructions to reproduce the zip archive.

RyanTorok avatar May 06 '24 12:05 RyanTorok

I updated #21681 to switch the source of the test vectors to the new GCP-backed archive for the consistent-basew version of SPHINCS+.

RyanTorok avatar May 06 '24 18:05 RyanTorok

Thanks @jadephilipoom. I assume this will break the current rom e2e sigverify_spx tests.

Is there an equivalent update for opentitantool?

Good point. I looked at the pqcrypto rust crate we're using, and it looks like it pulls from PQClean, which pulls from the master branch of the reference implementation :disappointed: so it's hard to just change something "easy" there and update it. There's an open issue on PQClean to update to the to-be-standardized version, but the timeline is unclear. We could try switching to a different crate, but the ones I can see that advertise "slh-dsa" don't seem very well-established. Not sure what we should do about this. We could depend on one of the newer crates, since it's just for testing anyway, or we could always hold off on merging this PR until the last minute or disable the tests that would fail until pqcrypto updates, but it would be good to run end-to-end tests before ROM freeze.

jadephilipoom avatar May 06 '24 18:05 jadephilipoom

Thanks for looking into the host side tooling @jadephilipoom. Should we be considering to do a similar thing to what we did on the device side an maintain our own host-side implementation? It seems that there are still additional changes in the pipeline which are going make it difficult for us to run integration testing.

Should we raise this issue to the Software WG?

moidx avatar May 06 '24 21:05 moidx

Thanks for looking into the host side tooling @jadephilipoom. Should we be considering to do a similar thing to what we did on the device side an maintain our own host-side implementation? It seems that there are still additional changes in the pipeline which are going make it difficult for us to run integration testing.

Should we raise this issue to the Software WG?

Yes, seems like a good topic for Software WG (and good timing too!)

jadephilipoom avatar May 07 '24 06:05 jadephilipoom

This issue was discussed in the software working group today. The consensus was that we should write our own Rust wrapper for the reference implementation.

mundaym avatar May 07 '24 15:05 mundaym

For visibility: I've cherry-picked the last commit from https://github.com/lowRISC/opentitan/pull/23326; this puts the host-side tooling in sync with the device for the endianness swap so that tests work. If it passes CI, I'll just merge this and close #23326,. Since both sets of changes have been reviewed and approved, I don't think it requires any additional review; just wanted to put an explanation for the new commit post-approve.

jadephilipoom avatar May 30 '24 14:05 jadephilipoom

There are a couple remaining test failures because (I think) some pre-signed binaries need to be manually re-generated. Can anyone tell me what the procedure is for that? I can see that commit c36ad25e8fbe158314dbe57f9f33a1576b71281d re-generated them due to opentitantool updates, which looks like the same thing I need, but I can't figure out which commands to run to do that again. @timothytrippel , do you remember?

jadephilipoom avatar Jun 04 '24 10:06 jadephilipoom

Update: after talking to @cfrantz and @moidx , I understand that none of the pre-signed binaries should need to be regenerated, and the tests are likely failing because the signatures opentitantool are producing are incorrect. ~I added a KAT on the opentitantool impl and it failed, so that is definitely the case and I'm working on figuring out what the bug is.~ (edit: it's not actually failing, I just forgot that SPHINCS+ signing is nondeterministic :woman_facepalming: anyway, the verification KAT passes, so I'm not sure what's happening.)

jadephilipoom avatar Jun 04 '24 16:06 jadephilipoom

Re-generating the keys under sw/device/silicon_creator/rom/keys/fake/spx fixed some of the failing tests (last commit). I find this really odd, because the key-generation procedure really should not have been affected by the endianness switch. My best guesses are:

  • something subtle changed in the interpretation of the PEM files (maybe some endianness reversal somewhere that made the new host-side code incompatible with the old)
  • somewhere a signature was saved, and for some reason got re-generated when I updated the keys
  • I'm wrong about key generation not being affected (but I just checked again now and I really don't see it)

Let's see what CI says, and I'll keep investigating in the meantime; I don't really want to leave this as a question mark. Ideally I'll figure out where the switch is and remove the last commit that re-generates the fake keys, but I wanted to pass it through CI to gather some more information.

CC @moidx @cfrantz

jadephilipoom avatar Jun 11 '24 10:06 jadephilipoom

Update: CI result has two failing tests, but both appear to be failing on all CI jobs at the moment and I think they're unrelated.

  • //sw/device/silicon_creator/rom/e2e/boot_policy_flash_ecc_error:a_valid_b_corrupt_manifest_security_version_fpga_cw310_rom_with_fake_keys
  • //sw/device/tests:uart_smoketest_fpga_cw340_rom_with_fake_keys

I don't have a CW340, but I can replicate the first one on master and both are also failing on https://github.com/lowRISC/opentitan/pull/23598, which only adds test data that's not used by anything. So I think it's safe to say that re-generating the keys fixed all the failures (although I'm not satisfied until I know why).

jadephilipoom avatar Jun 11 '24 12:06 jadephilipoom

Okay, I've figured out what's going on. Long story short: we should regenerate the test keys, so I'll leave the commit in. The old pqcrypto setup was using an outdated implementation of keygen/sign. Verification works as long as key generation and signing agree, but not if they disagree, which is what happened when I introduced updated signing code but left the old keys.

Here's a more detailed debugging log:

  • I was able to observe using the keypair_from_seed function that our bindgen for the SPHINCS+ reference implementation produced different public key data (PK.root) given the seed from the test keys.
  • I checked the same bindgen code against a seed from the modern SPHINCS+ KATs and found that they agreed.
  • I looked into the pqcrypto version we're using and found it's 0.6.4, which was released April 2022.
  • It wasn't straightforward to trace through pqcrypto and PQClean and figure out exactly which SPHINCS+ commit that was, so I just went back in history in my local clone of sphincsplus and kept re-generating KATs with the same seed values until I saw the value change.
  • The change happens in this commit from sphincsplus from June 2022, when the signing and key generation code is modified to have slightly different address types: https://github.com/sphincs/sphincsplus/commit/840524569d65bdeb1e19a30ef11c9c53c4c489ac
  • Verification code is not changed in that commit, which matches our observations (the modern verification routine worked with the outdated keys/signatures).

jadephilipoom avatar Jun 13 '24 10:06 jadephilipoom

Fixed to address @moidx's comment on real keys and also to re-generate "unauthorized" keys. I'll consider this OK to merge once CI passes, unless anyone objects.

jadephilipoom avatar Jun 13 '24 10:06 jadephilipoom