heads icon indicating copy to clipboard operation
heads copied to clipboard

Better error messages for replay attacks

Open 3hhh opened this issue 3 years ago • 15 comments

Is your feature request related to a problem? Please describe. I tested some /boot/ replay attacks against heads with a Nitrokey in HOTP mode. Usually they resulted in PCR XYZ mismatch and spawned an emergency shell on default boot. Selecting "Show OS boot menu" resulted in no error at all and just returned me to the previous menu.

Describe the solution you'd like Some error clearly saying that the user might be victom to a replay attack.

Describe alternatives you've considered Current situation: Cryptic messages or none at all.

Additional context To test, I reused an old signed /boot/kexec_hashes.txt with the corresponding files. I kept the other kexec_* files.

3hhh avatar Dec 13 '22 20:12 3hhh

@3hhh HOTP/TOTP firmware integrity attestation requires sealed secret to be unsealed with valid past sealed PCRs value.

PCR XYZ mismatch error would be given to you if the firmware has been modified as compared to sealed value in TPM nvram in unsealing operations needed prior of generating TOTP code on screen or attempting to challenge secret against HOTP with USB Security dongle.

  • https://github.com/osresearch/heads/blob/315febdf74938691a93d8f2cc657bd55d7d922e6/initrd/bin/unseal-totp
  • https://github.com/osresearch/heads/blob/315febdf74938691a93d8f2cc657bd55d7d922e6/initrd/bin/unseal-hotp

This is not linked to detached signed digests validation against your public key fused in ROM.


Here is what happens when you select Show boot options:

Show OS boot option under gui-init https://github.com/osresearch/heads/blob/315febdf74938691a93d8f2cc657bd55d7d922e6/initrd/bin/gui-init#L423

calls select_os_boot_option https://github.com/osresearch/heads/blob/315febdf74938691a93d8f2cc657bd55d7d922e6/initrd/bin/gui-init#L433

which calls verify_global_hashes https://github.com/osresearch/heads/blob/315febdf74938691a93d8f2cc657bd55d7d922e6/initrd/bin/gui-init#L538-L539

which calls initrd/etc/functions's check_config with force https://github.com/osresearch/heads/blob/315febdf74938691a93d8f2cc657bd55d7d922e6/initrd/bin/gui-init#L66-L67

https://github.com/osresearch/heads/blob/315febdf74938691a93d8f2cc657bd55d7d922e6/initrd/etc/functions#L248-L274


At this point, some die message should be given on the recovery shell as a trace. Can you provide such?

On my side, I will have to review the reasoning of why verify_global_hashes is calling check_config with force, but there seems to an error handling missing in the case verify_global_hashes is not successful, which explains why you are getting back to previous menu https://github.com/osresearch/heads/blob/315febdf74938691a93d8f2cc657bd55d7d922e6/initrd/bin/gui-init#L535-L541

In your case, check_config fails, but since verify_global_hashes has no else statement, the return call makes you return to previous menu.


There is no TPM rollback prevention on /boot digests. /boot content is verified against GPG detached signed digests, where TPM/HOTP rollback counters are complete different things. The HOTP counter you seem to be talking about is a simple counter that is kept in sync between your laptop and the USB Security dongle. This currently binds a HOTP security dongle with a single laptop currently and will throw an error if that same HOTP enabled Security dongle was used on another computer, since the counter would be out of sync.

Can you please provide errors given at the recovery shell console? (That would be present if your flashed ROM contains that commit https://github.com/osresearch/heads/commit/139ecb82b254da0b3808421e00fb6a8b11edbdb9 (November 16th 2022, which was "clearing" the console in between whiptail GUI menus drawing).

Also, die and warn calls now sleep, providing output to the user https://github.com/osresearch/heads/blob/315febdf74938691a93d8f2cc657bd55d7d922e6/initrd/etc/functions#L4-L13 since https://github.com/osresearch/heads/commit/b67f8e19ce1e00f317cb784d94b4e29d62cee8e3. Those would be helpful to understand what is happening in your tests.

tlaurion avatar Dec 14 '22 18:12 tlaurion

This is not linked to detached signed digests validation against your public key fused in ROM.

Pretty sure it is somewhere as otherwise replay attacks would just work. Btw I double checked: It was PCR 4 that it complained about. It also complained about GPG verification failure.

Anyway I'm running a bit old version of heads. So let me outline more in detail on how to reproduce it:

  1. Save /boot/kexec_hashes.txt, /boot/kexec.sig and /boot/[file to change in 2] somewhere.
  2. Change a file (e.g. /boot/.auditing-0, not sure what it's for anyway).
  3. Reboot & sign again.
  4. Restore the files saved in 1.
  5. Reboot.

I guess that the signature and verification happens with a "temporary" key derived from the counter and the hidden master key or so.

3hhh avatar Dec 15 '22 17:12 3hhh

default boot option checks for detached signatures as opposed to show boot options above, which also checks signature when comes the time to boot one of those options, unless unsafe boot option is selected.

https://github.com/osresearch/heads/blob/315febdf74938691a93d8f2cc657bd55d7d922e6/initrd/bin/gui-init#L330-L340

https://github.com/osresearch/heads/blob/315febdf74938691a93d8f2cc657bd55d7d922e6/initrd/bin/gui-init#L343-L346

https://github.com/osresearch/heads/blob/315febdf74938691a93d8f2cc657bd55d7d922e6/initrd/bin/gui-init#L543-L558

Same verify_global_hashes which first test check_config with force (no checking of kexec* files against detach signature) which is delayed to booting: https://github.com/osresearch/heads/blob/315febdf74938691a93d8f2cc657bd55d7d922e6/initrd/bin/gui-init#L547-L549 here: https://github.com/osresearch/heads/blob/315febdf74938691a93d8f2cc657bd55d7d922e6/initrd/bin/gui-init#L552

Which this time calls check_config without force: https://github.com/osresearch/heads/blob/315febdf74938691a93d8f2cc657bd55d7d922e6/initrd/bin/kexec-select-boot#L320-L325

here (caller doesn't pass force to check_config): https://github.com/osresearch/heads/blob/315febdf74938691a93d8f2cc657bd55d7d922e6/initrd/etc/functions#L265-L269

Which in your test should give error when booting default option, or when you select a boot option, unless check_config force fails verifying one of the checksums under kexec_hashes.txt for a file under /boot (which you seem to call "rollback prevention" here, which as stated, doesn't exist for /boot files themselves).

The only rollback mechanisms that exist are implemented through TPM to verify that TPM counter is consistent from TPM ownership but there are some misunderstanding here, and as said before, errors would permit to elaborate and conclude. Also, a commit id of the version you currently have flashed (present under /etc/config) would be helpful with errors obtained.

Also note that PCR values are documented at https://osresearch.net/Keys/#tpm-pcrs where PCR4 is extended when you go into recovery shell and back, preventing to unseal TPM disk encryption key on default boot option. Hopefully this is clearer?

Otherwise, from code paths exposed above (default boot options or show boot options where unsafe boot option is the only one skipping kexec.sig validation against kexec* files), invalid hashes for files present under /boot should output something from verify_global_hashes, and attempting boot will validate detached signed digests (kexec.sig) prior of booting, unless unsafe boot option is chosen.

There is a "die" missing though, as said prior, where verify_global_hashes failing should not silently return to previous menu without at least stating that the hashes were invalid. That would require a separate bug issue to track and fix.

tlaurion avatar Dec 16 '22 15:12 tlaurion

Without any understanding of how the replay attack protection works, I doubt we'll be able to find out whether a more precise error message could be implemented or not. Btw it's not a derived key as I had previously guessed - one can see that the same GPG key is re-used from the GPG verification output.

There is a "die" missing though, as said prior, where verify_global_hashes failing should not silently return to previous menu without at least stating that the hashes were invalid. That would require a separate bug issue to track and fix.

I created #1257 for that.

3hhh avatar Dec 17 '22 10:12 3hhh

Btw it's not a derived key as I had previously guessed - one can see that the same GPG key is re-used from the GPG verification output.

I'm confused by that statement. Please give console output, code lines corresponding with your firmware version or something I can chew on.

Without any understanding of how the replay attack protection works, I doubt we'll be able to find out whether a more precise error message could be implemented or not.

The rollback protection is limited, and happens when signing /boot content.

Let's explore code to fact check. The file in question is under /boot/kexec_rollback.txt. Code refers to TPM rollback with TPM functions like read_tpm_counter or check_tpm_counter, where it is actually used (non-optional) per kexec-sign-config under current codebase.

Some background: https://github.com/osresearch/heads/blob/bf3898a2a1465ae5360c85947c25abf85fc7b443/initrd/bin/kexec-sign-config#L27-L40

kexec_hashes.txt here includes everything but kexec files per find statement, neglecting those files.

Now: https://github.com/osresearch/heads/blob/bf3898a2a1465ae5360c85947c25abf85fc7b443/initrd/bin/kexec-sign-config#L42-L61

The above reads the counter (read_tpm_counter) from TPM if TPM has been owned by user to get value in ram under $counter, increments the counter through checktpm_counter against kexec_rollback.txt.

The kexec.sig includes the detached signature of that kexec_rollback.txt (not to be mixed with the kexec_hashes.txt). kexec.sig is validated later on at kexec-select-boot, but is generated here:

https://github.com/osresearch/heads/blob/bf3898a2a1465ae5360c85947c25abf85fc7b443/initrd/bin/kexec-sign-config#L68-L78

validation of kexec.sig detached signature is validated per check_config when not called with force parameter https://github.com/osresearch/heads/blob/c1fb04cd5c9b8e5ffd3430e2ab370529468d859b/initrd/etc/functions#L265-L269

https://github.com/osresearch/heads/blob/139ecb82b254da0b3808421e00fb6a8b11edbdb9/initrd/bin/kexec-select-boot#L320-L325

Which such kexec-select-boot call happens from default boot option or show boot option and where force is added when unsafe boot option is being called.

Makes more sense?


Short version: the "replay atack" prevention you are talking about is actually implemented at the time of detach signing /boot configs, and is validated on the fly when kexec.sig detached signature of /boot content is validated, for which kexec_rollback.txt as been created at last /boot content detached signature.

So if I understand well the present issue you have created, you would love kexec-sign-config to do something different here https://github.com/osresearch/heads/blob/21505aa5dd65b24f22b8ee9fe9624510b6af0b06/initrd/bin/kexec-sign-config#L42-L61

Or you would love kexec.sig verification to provide more output then it currently gives (but kexec.sig doesn't include any detail of detached signed content. Its a binary success/failure here).

Could you detail a little bit more what you would love to see happening here?

If I take your notes here

To test, I reused an old signed /boot/kexec_hashes.txt with the corresponding files. I kept the other kexec_* files.

This is not sufficient to do anything with. check_config force call will validate kexec_hashes.txt, which as we digged for, doesn't include kexec_rollback.txt, which is only verified against kexec.sig when doing a default boot/how boot option without forcing unsafe boot, which is where kexec.sig is validated.

Save /boot/kexec_hashes.txt, /boot/kexec.sig and /boot/[file to change in 2] somewhere.
Change a file (e.g. /boot/.auditing-0, not sure what it's for anyway).
Reboot & sign again.
Restore the files saved in 1.
Reboot.

I guess that the signature and verification happens with a "temporary" key derived from the counter and the hidden master key or so.

We understood from this code digging here that signing /boot content here will update the tpm counter under kexec_rollback.txt which is taken into consideration under kexec.sig. So here again, default boot/show boot option will fail. I understand you would want more detail on the reason of the failures, but the only thing Heads knows is that detached signature (kexec.sig) is not matching /boot but cannot give more details then this.

@3hhh What would you want to see and where from the code snippets provided? Currently, this is the error Heads would provide to tell globally that kexec.sig (detached signature) failed without giving details it doesn't have: https://github.com/osresearch/heads/blob/bf3898a2a1465ae5360c85947c25abf85fc7b443/initrd/etc/functions#L265-L269

From your example above, where you signed (and then updated TPM rollback counter) but moved backuped kexec.sig (which contains detached signature for kexec_rollback.txt which is now invalid), default booting should fail with 'Invalid signature on kexec boot params'

tlaurion avatar Dec 19 '22 17:12 tlaurion

@3hhh : Which goes back to quick testing:

[user@dom0 ~]$ sha256sum $(find /boot/kexec*.txt) | gpg --verify /boot/kexec.sig -
gpg: Signature made Sat 17 Dec 2022 04:13:36 PM EST
gpg:                using RSA key ACF4B7893D4D05C8F18069BAE7B4A71658E36A93
gpg: Good signature from "Insurgo Technologies Libres / Open Technologies <[email protected]>" [unknown]
gpg:                 aka "[jpeg image of size 9521]" [unknown]
gpg: WARNING: This key is not certified with a trusted signature!
gpg:          There is no indication that the signature belongs to the owner.
Primary key fingerprint: ACF4 B789 3D4D 05C8 F180  69BA E7B4 A716 58E3 6A93

Which kexec.sig detached signature is for:

[user@dom0 ~]$ find /boot/kexec*.txt
/boot/kexec_default.1.txt
/boot/kexec_default_hashes.txt
/boot/kexec_hashes.txt
/boot/kexec_key_devices.txt
/boot/kexec_lukshdr_hash.txt
/boot/kexec_rollback.txt

[user@dom0 ~]$ sha256sum $(find /boot/kexec*.txt)

ae7b4ffe5c7ef8f04bb7cebaf3b2a46d3b444b84b0ce23346365012fd0288828  /boot/kexec_default.1.txt
408ad13a31d4bb82b79745b82c36a1cad0507d4d453c156c15c83d2711fb4eef  /boot/kexec_default_hashes.txt
0ff5fd8164a49fc5fc4b7d98a1c3972974f72d655ee2fba8a7782a2bac59abf9  /boot/kexec_hashes.txt
96cb689c5c5a2fd69382796392c5a407bb28b3102a71b753806f06d31b9c9c01  /boot/kexec_key_devices.txt
8f6fb96013e7345f54ac38b78ce5d0c9184498711322774cb093e7050b5a7c68  /boot/kexec_lukshdr_hash.txt
1de13fc06ab9a4ede0f536d634361e864f2d202b2fe115231be1d73eb4411971  /boot/kexec_rollback.txt

And if I modify one bit in /boot/kexec_rollback.txt for the sake of the test:

[user@dom0 ~]$ sha256sum $(find /boot/kexec*.txt)
ae7b4ffe5c7ef8f04bb7cebaf3b2a46d3b444b84b0ce23346365012fd0288828  /boot/kexec_default.1.txt
408ad13a31d4bb82b79745b82c36a1cad0507d4d453c156c15c83d2711fb4eef  /boot/kexec_default_hashes.txt
0ff5fd8164a49fc5fc4b7d98a1c3972974f72d655ee2fba8a7782a2bac59abf9  /boot/kexec_hashes.txt
96cb689c5c5a2fd69382796392c5a407bb28b3102a71b753806f06d31b9c9c01  /boot/kexec_key_devices.txt
8f6fb96013e7345f54ac38b78ce5d0c9184498711322774cb093e7050b5a7c68  /boot/kexec_lukshdr_hash.txt
2cb38687ec9e948020bbee25523dcc6114bc72e221b0ee2ed3cf3e2d98fd5099  /boot/kexec_rollback.txt

Which fails detached signature validation:

[user@dom0 ~]$ sha256sum $(find /boot/kexec*.txt) | gpg --verify /boot/kexec.sig -
gpg: Signature made Sat 17 Dec 2022 04:13:36 PM EST
gpg:                using RSA key ACF4B7893D4D05C8F18069BAE7B4A71658E36A93
gpg: BAD signature from "Insurgo Technologies Libres / Open Technologies <[email protected]>" [unknown]

So if I understand you correctly, you would love to have additional checks against checksum tpm-counter at https://github.com/osresearch/heads/blob/bf3898a2a1465ae5360c85947c25abf85fc7b443/initrd/etc/functions#L265-L269

[user@dom0 ~]$ cat /boot/kexec_rollback.txt 
e3c832c290bda9c1c57d0539ccfc5e2895596052d124472a1911c95f7938e543  /tmp/counter-53486821

tlaurion avatar Dec 19 '22 18:12 tlaurion

I think that https://github.com/osresearch/heads/blob/139ecb82b254da0b3808421e00fb6a8b11edbdb9/initrd/bin/kexec-select-boot#L69-L83

Should be moved to /etc/functions and code refactored to use it better. Everything boot related is in kexec-select-boot, so I think doing calls to verify_global_hashes being a prerequisite to those calls is not bad, while implementing checks for TPM counter (sha256sum -c and proper error handling) would be the way to go to give more output on the present matter.

tlaurion avatar Dec 19 '22 19:12 tlaurion

Ok, I think I now understand how the replay attack protection in heads is supposed to work (it's called "rollback counter" though, which made me ignore that part of the code all of the time...):

  1. At signature time, a TPM counter is created and its content stored at /tmp/counter-[name].
  2. That counter is hashed, the hash is stored at /boot/kexec_rollback.txt.
  3. At verification time, the counter is read from the TPM and stored at /tmp/counter-[name].
  4. Verification fails, if a) the signature mismatches as someone didn't roll back/replay /boot/kexec_rollback.txt (my case above). b) the counter hash mismatches as the TPM had increased its counter during a more recent sign operation.

The security is based on the fact that TPM counters can only be incremented and never set to a specific value.

Obviously a) is a generic verification failure and cannot be more precise, i.e. my original request is invalid.

b) could be more precise here though: I'd suggest an error message such as Invalid TPM counter state. Replay attack?!. The same probably applies to the other verifiy_rollback_counter error messages. Users don't really care or know how the replay/rollback attack protection is implemented (i.e. what the TPM counters are used for). They want to know whether they were attacked or not.

3hhh avatar Dec 25 '22 11:12 3hhh

Hitting this when testing same disk with qemu-coreboot tpm1/tpm2 boards (EDIT: #1292)

Issue arises after resealing totp/hotp after switching board config and generating hashes + sign instead of resetting TPM. EDIT: this happens when reusing thr same OS disk image between board configs, that is, with different swtpm instances and therefore, different TPM counter saved under /boot/

Basically, Heads refuses to generate+sign hashes when TPM counter is different then expected.

tlaurion avatar Mar 13 '23 13:03 tlaurion

increment_tpm_counter()
{
        TRACE "Under /etc/functions:increment_tpm_counter"
        tpmr counter_increment -ix "$1" -pwdc '' \
                | tee /tmp/counter-$1 \
        || die "Invalid TPM counter state. Replay attack?!"
}

@3hhh would be enough to your taste?

tlaurion avatar Mar 13 '23 14:03 tlaurion

Are you sure the failure comes from increment_tpm_counter? In current master increment_tpm_counter prints "Counter increment failed".

As I had said I rather think it's verify_rollback_counter and voted for changing its 3 error messages, but I might be wrong or there are multiple error paths.

3hhh avatar Mar 18 '23 10:03 3hhh

Both verify and increment checks for it. The point here is that the counter file inside of boot is put under /tmp for verification through kexec.sig validation of everything kexec*.txt, which is excluded from global checksums file at creation and then unable to be verified globally. Only TPM incr and check will fail here.

I think the proposed message should be part of both die statements, which pauses and show message on console for 2 seconds to the user.

As discussed in other issues, kexec.sig detached signature (including tpm counter because a kexec*.text file) will fail detached signature verification but will not give the user any useful bits of information on that point. Where verifying checksums will not cover discrepancies for that counter, since not included.

I think kexec.sig creation and verif should be reworked so that its content are also checksummed separately to be able to pinpoint directly where they are different.

But as if now, I think that the above, modified die messages in both increment and check codepath would give sufficient insight on TPM counter not being in sync, meaning that the TPM counter is invalid. But why is a different story.

Will try to reproduce again.

tlaurion avatar Mar 18 '23 15:03 tlaurion

Its easy to replicate on qemu as I said, reusing same qemu disk install between different board configs, so that TPM involved (per board TPM) gives the error, since counter on disk is different of what TPM keeps.

How to test this on same machine outside of tampering the disk image is unknown to me. The goal of the TPM counter is to make sure signed configs are as expected, taking TPM counter as external factor being part of detached signature. So again, the protection offered here is to refuse booting if that counter is invalid, that is, if /boot has been reverted to contain known vulns to be exploited after boot and where Heads should prevent it.

So my question here is what die "Invalid TPM counter state" should become.

die "Invalid TPM counter state. Replay attack?!" Enough?

tlaurion avatar Mar 18 '23 15:03 tlaurion

So my question here is what die "Invalid TPM counter state" should become.

die "Invalid TPM counter state. Replay attack?!" Enough?

I now consider my initial proposal a bit crude. Probably the following is better: die "[old error message here]. This may indicate a replay attack."

Btw I'm pretty sure https://github.com/osresearch/heads/blob/139ecb82b254da0b3808421e00fb6a8b11edbdb9/initrd/bin/kexec-select-boot#L73 could also indicate a replay attack as the attacker could copy back/replay a bunch of old files to /boot and just remove /boot/kexec_rollback.txt.

"Failed to read TPM counter" at least from verify_rollback_counter should never be seen, rather "Counter read failed". That might happen if someone attacked the TPM (not necessarily only a replay attack IMHO), i.e. "[old error]. This may indicate a replay or TPM attack." might be more suitable there.

Since increment is only called during reset and sign, I guess that you hit into it there? I guess "[old error]. This may indicate a replay or TPM attack." would be appropriate there as well.

3hhh avatar Mar 19 '23 10:03 3hhh

Will try to reproduce again.

I think I found the way to kind of reproduce it, running on Thinkpad X230 with https://github.com/osresearch/heads/pull/1350/commits/3c4b1cc083c962998c90c37f44d4124a42bc3e75

I have /dev/sda1 /boot filesystem with no /boot/grub, only some ISO files lying around.

I do OEM factory reset. What will happen, TPM will be reset but no checksums will be regenerated and re-signed (due to SKIP_BOOT being set).

I reboot after the reset and generate new TOTP, so far so good.

But subsequent attempt to sign /boot from gui-init fails with

Got error 'Bad counter handle' (0x45) from TPM_IncrementCounter.

So I have to reset TPM again, and after this I can sign my /boot... ugh

(filed https://github.com/osresearch/heads/issues/1368 for this one)

saper avatar Apr 06 '23 23:04 saper