routinator Prefer highest numbered, valid, complete manifest

After some discussion today I think that if multiple copies exist (current content, cache) the most recent (highest number) manifest should be used.

If the cache, and current repository both contain a manifest that:

passes all relevant checks
and thus has a complete set of files present
and thus is in its validity window

Routinator should pick the manifest with the highest number, and log if the repo contained an older number than the cache.

This is a trade off: this picks the legit object during a replay attack. However it also sticks to the wrong object when a CA traveled back in time after a operational issue. This is not a security issue, but this implementation is closer to rfc9268.

Dec 11 '23 17:12 ties

I think using the manifest CMS signing time (which in practice is almost present) is a better candidate as the first tiebreaker because of the issue you mentioned. If a CA is restored from backup and does not know what the last manifest number is then that could regress - signing time should be more reliable in the absence of actual time travel.

Dec 12 '23 11:12 timbru

I would follow the MUSTs in RFC9286.

But indeed if you hit the situation where you see equal manifestNumbers, signing time is a feasible tie-breaker. The other way to go (since the CA is mis-issuing) is to trust what the RP observed first. Or to take the "narrowest matching window" of the EE cert validity (so you can even tie-break multiple identical signing times). Anyway, to me it's a quite unlikely scenario.

Dec 12 '23 14:12 ties

Routinator has at most two versions of a publication point: the one in the collector that is being updated from the source and used if the update succeeded or for a certain time thereafter and the last valid version kept in the store and fallen back to if the collector version cannot be used. What we would do now is to check the manifest number in the stored version if available and reject the collected version if that has a lower number.

Which bears the question: What happens if we reject the collected version for this reason even though it is valid but the stored version is stale or expired?

Dec 12 '23 15:12 partim

I think the RFC may just be wrong here. I.e. just because the CA MUST increment doesn't mean they can't have failures where they cannot. I would personally use the signing time - it's a signed thing. And if that is not present, then use the manifest number.

Dec 12 '23 15:12 timbru

But yeah, others feel different, so go with the RFC as the path of least resistance.

Dec 12 '23 15:12 timbru

ok, a fairly strong counter-argument against accepting lower number, higher signing-time was brought to my attention:

If a CA forgot what it issued in the past, unfortunately that CA will have to be keyrolled out of the ecosystem; because apparently there no longer is any telling what that CA's keypair signed in the past.

This is indeed true. Practically I don't think it may matter for ROAs and other things that need to be on a manifest, but especially with regards to RSC and possible other objects not listed on the manifest by design, the CA would have lost the ability to revoke.

A CA key roll-over is cleanest. What I don't see clearly is how this can be easily explained to delegated / hybrid CA operators - but that is probably not Routinator's problem to solve.

Dec 12 '23 15:12 timbru

Thanks for adding this. This will lead to more deterministic behaviour in cases where a manifest number is re-cycled - like I was debugging today for a 3rd party RPKI repository.

Apr 02 '24 13:04 ties