tpm2-tss
tpm2-tss copied to clipboard
TCTI state transitions are inconsistent and unpredictable
Looking through the TCTI code, I have noticed a couple of inconsistencies.
Cancellation
The tcti-mssim
and tcti-pcap
TCTIs will transition directly to the TCTI_STATE_TRANSMIT
state after a successful call to cancel
. This contradicts the specification which states that another call to receive
is required before another command can be transmitted. This means there's no way for the caller to get the response buffer for the cancelled command, which is required to tell if the command actually executed or not. Higher layers in the stack will likely also hit unexpected errors if a command using these TCTIs is cancelled. For example, I think the ESYS library will enter an unrecoverable internal error state if the caller does
Esys_{Command}_Async(context, ...);
Esys_GetTcti(context, &tcti);
tcti->cancel(tcti);
Esys_{Command}_Finish(context, ...);
which I believe is the intended pattern for cancelling a command. Note that skipping the Esys_{Command}_Finish
call leaves the ESYS context in the _ESYS_STATE_SENT
state where new commands cannot be issued, which is effectively the same as the unrecoverable error state for this purpose.
All other TCTIs in this repo appear to act correctly in this regard. This looks like it's straight forward to fix, but I don't know much about the lower level interface they're using. Given the above, I would guess no one is relying on this behaviour.
Receive errors
Some TCTIs will also transition to the TCTI_STATE_TRANSMIT
state on certain errors in the receive
call. The spec doesn't explicitly comment on this either way, and neither do the man pages or installed headers, but there is a comment in internal tcti-common.h
header that claims this occurs for all return codes other than TRY_AGAIN
, INSUFFICIENT_BUFFER
, BAD_CONTEXT
, BAD_REFERENCE
, BAD_VALUE
, and BAD_SEQUENCE
, but the behaviour of some TCTIs diverges from this. For example:
-
tcti-cmd
transitions when returningINSUFFICIENT_BUFFER
. -
tcti-device
does not transition when returningIO_ERROR
. It is also ambiguous when returning unmarshaling errors andGENERAL_FAILURE
, transitioning on some code paths but not others. -
tcti-i2c-helper
only transitions when returningSUCCESS
, even though there are paths that returnIO_ERROR
and lower-layer errors that don't document any restrictions on the errors they should return. -
tcti-pcap
doesn't appear to interpret error codes from the underlying TCTI to decide whether to change its own state or not, if these get out of sync the TCTI would become unusable. -
tcti-libtpms
andtcti-mssim
appear to behave correctly i.e. the same as the comment intcti-common.h
.
This has very similar issues as the cancellation issue, where higher layers of the stack are entirely unprepared for this.
My theory of the intent of this behaviour is that it's intended to represent three types of errors through different error codes:
- Transient errors that mean the caller should retry
receive
. - Fatal errors that mean the caller should give up on using this TCTI context entirely.
- Errors that mean the caller should give up on getting a response for this command, but can still try to send more commands in the future.
Considering the widely varying existing behaviour, and the common pattern of returning error codes from lower layers without inspection, I think trying to signal the presence or absence of a state transition through the specific error code returned was a mistake.
The most straightforward fix within the existing interface would be to say that error returns never indicate a transition out of the TCTI_STATE_RECEIVE
state. Equivalently, transmit
can only be called after initialization or a successful receive
call. Transient errors then get retried by the caller as appropriate, and a fatal error is represented by simply never being able to perform a successful receive
. Higher layers of software generally seem to assume this already, but it's more likely that someone would be broken by this than by the proposed cancellation fix above.
I am currently experiencing issues running latest tpm2-tss with the ibmswtpm2.
Could this bug be the reason and what is a good tpm2 tss version that has mssim TCTI stable?
[1] 89
root@0a72a66cbc25:/ibmswtpm2/src# LIBRARY_COMPATIBILITY_CHECK is ON
Manufacturing NV state...
Size of OBJECT = 1732
Size of components in TPMT_SENSITIVE = 1096
TPMI_ALG_PUBLIC 2
TPM2B_AUTH 66
TPM2B_DIGEST 66
TPMU_SENSITIVE_COMPOSITE 962
MAX_CONTEXT_SIZE can be reduced to 1808 (2680)
Starting ACT thread...
TPM command server listening on port 2321
Platform server listening on port 2322
root@0a72a66cbc25:/ibmswtpm2/src# tpm2_startup -T mssim:host=localhost,port=2321
Command IPv4 client accepted
Platform IPv4 client accepted
WARNING:esys:src/tss2-esys/api/Esys_Startup.c:212:Esys_Startup_Finish() Received TPM Error
ERROR:esys:src/tss2-esys/api/Esys_Startup.c:78:Esys_Startup() Esys Finish ErrorCode (0x000001c4)
ERROR: Esys_Startup(0x1C4) - tpm:parameter(1):value is out of range or is not correct for the context
ERROR: Unable to run tpm2_startup
Platform server listening on port 2322
TPM command server listening on port 2321
root@0a72a66cbc25:/ibmswtpm2/src# tpm2_getrandom 8 -T mssim:host=localhost,port=2321
Command IPv4 client accepted
Platform IPv4 client accepted
WARNING:esys:src/tss2-esys/api/Esys_GetCapability.c:301:Esys_GetCapability_Finish() Received TPM Error
ERROR:esys:src/tss2-esys/api/Esys_GetCapability.c:106:Esys_GetCapability() Esys Finish ErrorCode (0x00000100)
ERROR: Esys_GetCapability(0x100) - tpm:error(2.0): TPM not initialized by TPM2_Startup or already initialized
ERROR: Unable to run tpm2_getrandom
Platform server listening on port 2322
TPM command server listening on port 2321
root@0a72a66cbc25:/ibmswtpm2/src#
With tpm2_startup -c -T mssim:host=localhost,port=2321
it worked for me.