boulder Add CAA presence/validity information to ValidationRecords

Since this is important information about an authorization.

Sep 09 '16 18:09 rolandshoemaker

This should probably also contain the the highest label at which a record was found if one is found at all.

Sep 10 '16 21:09 rolandshoemaker

I would be interested in trying to do this. Do you have any directions/hints?

Dec 15 '17 08:12 laurencemcf

@rolandshoemaker I would be interested in implementing this. Is it still reverent?

Jul 20 '18 08:07 laurencemcf

@rolandshoemaker @jsha I would be interested in implementing this, so I was wondering if it is still relevant, and what information you think would be good to keep in the validation records.

Jul 24 '18 19:07 laurencemcf

Hey! Sorry for the silence, I completely missed your comments. This is still relevant and it would be great if you had a chance to take a stab at it. I think it makes sense to just include the CAA information for now.

Jul 24 '18 20:07 rolandshoemaker

No problem. So from my understanding of the issue you would want validation record similar to below for now? Do you have any preferences on the format?

[
    {
        "url":"http://c185da94d854a28961368b75dd634b29afcf7f60.com:5002/.well-known/...",
        "hostname":"c185da94d854a28961368b75dd634b29afcf7f60.com",
        "port":"5002",
        "addressesResolved":["127.0.0.1"],
        "addressUsed":"127.0.0.1"
    },
    {
        "caaStatus": "none|valid|invalid"
    }
]

Jul 25 '18 08:07 laurencemcf

:wave: @laurencemcf - thanks for your interest in taking this issue!

I talked a bit with @rolandshoemaker out of band. I think we've both come around to the consensus the CAA information should be captured at the authorization level and not in the authorization's validation records. This slightly complicates the work involved as it would require a database migration to add a new column to the authz table to maintain the CAA state.

The other "catch" related to this work is when an issuance for an identifier is backed by a valid authorization older than 8hrs we need to recheck CAA per the BRs and should also update the authorization's CAA state at that time. That makes CAA information a many-to-one relation with an authorization.

I apologize that I'm providing this guidance after you've already taken a crack at an implementation in #3806. Does the above make sense to you? Are you interested in closing #3806 and resubmitting a PR based on the above?

Thanks!

Jul 30 '18 17:07 cpu

@cpu No problem, I'll close #3806 and have a go at doing this way.

So to confirm we want to add a new json column to the pendingAuthorizations and authz tables (although I think the column in pendingAuthorizations table will be empty). When we do a recheck we'll add a new item into the array with the information from that recheck.

A example of the format of the new column:

[
    {
        "status": "none-found|found-valid|invalid",
        "timeChecked": "datetime"
    },
    {
        "status": "none-found|found-valid|invalid",
        "timeChecked": "datetime-8-hours-later"
    }
]

Jul 31 '18 08:07 laurencemcf

@laurencemcf Great, thanks!

So to confirm we want to add a new json column

I think we should avoid JSON columns. A separate table for CAA information would be better normalized and easier to query against. @rolandshoemaker do you concur?

@rolandshoemaker Adding an empty field to the pendingAuthz table is awkward... Do you have any thoughts there? In theory that field will get removed when we finally consolidate the two tables so maybe its a fair compromise.

Jul 31 '18 12:07 cpu

If we want to have a separate table for CAA, then due to the many-to-one relationship I think we can avoid having a column in either pendingAuthorizations or authz. These table could be left the same and we would specify a foreign key in the CAA table mapping back to authz. The definition of the table would be something like below (identifier would contain the DNS name the CAA records was found at).

CREATE TABLE `caa` (
  `id` varchar(255) NOT NULL,
  `authzID` varchar(255) DEFAULT NULL,
  `status` varchar(255) DEFAULT NULL,
  `identifier` varchar(255) DEFAULT NULL,
  `timestamp` datetime DEFAULT NULL,
  PRIMARY KEY (`id`),
  CONSTRAINT `caa_authz` FOREIGN KEY (`authzID`) REFERENCES `authz` (`id`)
)

Aug 01 '18 10:08 laurencemcf

@laurencemcf Of course! That makes sense :-) Thanks for pointing that out.

@rolandshoemaker Are you OK with the separate table approach?

Aug 01 '18 12:08 cpu

I think this approach makes sense, in the past we've been way too willing to just keep stuffing stuff in existing tables when a new table would make more sense.

In terms of the schema I think the ID should be a BIGINT(20) NOT NULL AUTO_INCREMENT and the rest of the fields should be NOT NULL instead of DEFAULT NULL since we'll never want them empty. Given we only expect status to be one of three things we could also go with a ENUM, but that's not super important.

Probably we'll also want a KEY on authzID since that's likely how we'll be looking these up the majority of the time. @cpu can you think of any other indexy things here?

Aug 01 '18 19:08 rolandshoemaker

@cpu can you think of any other indexy things here?

@rolandshoemaker I think it might be prudent to put an index on the identifier field. We can join back to the authz table to get the identifier for the authorization but I can imagine scenarios where we may want to be querying the caa table by identifier. E.g. maybe for the base domain in the case of a CAA decision based on the record for example.com when the authorization was for foo.example.com

Aug 01 '18 20:08 cpu

Hm, good point. We may want to change the name of the identifier column as well, it's slightly confusing given we have a loaded definition of that term elsewhere in the codebase that this doesn't conform with. Perhaps just presentAt or something would be a bit clearer.

Aug 01 '18 20:08 rolandshoemaker

In terms of the schema I think the ID should be a BIGINT(20) NOT NULL AUTO_INCREMENT and the rest of the fields should be NOT NULL instead of DEFAULT NULL since we'll never want them empty.

@rolandshoemaker makes sense, but I was just wondering what we should do for the presentAt field when we don't find any CAA records? Do you want to include the last suffix (for com for foo.example.com), or should we make this field null?

Also is there any other information which would be useful to include in this table, e.g. whether it was a wild card check, or the validation methods allowed? I don't think either of those things would be much work to add but may make the table more useful depending on the querying patterns you envision.

Aug 02 '18 07:08 laurencemcf

I've been working on implementing this and had a question about the best way to do it.

We need to add a new RPC between the RA and SA to save the CAA recheck information (for authorizations older than 8hrs), and I was wondering if we should use this new RPC when we perform the initial validation as well. At the time of the initial validation we are already calling the FinalizeAuthorization RPC, so if we use the new RPC we would be performing two SA calls one after another. Would this be a problem (I'm not sure of the costs of preforming them)? The alternative would be to store the new CAA records into the authz object and save them during the FinalizeAuthorization call.

Aug 08 '18 09:08 laurencemcf

@rolandshoemaker makes sense, but I was just wondering what we should do for the presentAt field when we don't find any CAA records? Do you want to include the last suffix (for com for foo.example.com), or should we make this field null?

Yup null makes sense in that case.

Also is there any other information which would be useful to include in this table, e.g. whether it was a wild card check, or the validation methods allowed? I don't think either of those things would be much work to add but may make the table more useful depending on the querying patterns you envision.

Probably makes sense to include if it was a wildcard check, other than that I can't think of anything in particular, given our parameter work is still being changed I think it should be left out of any initial implementation and we can extend the table at a later date if we want to include that info.

I've been working on implementing this and had a question about the best way to do it.

We need to add a new RPC between the RA and SA to save the CAA recheck information (for authorizations older than 8hrs), and I was wondering if we should use this new RPC when we perform the initial validation as well. At the time of the initial validation we are already calling the FinalizeAuthorization RPC, so if we use the new RPC we would be performing two SA calls one after another. Would this be a problem (I'm not sure of the costs of preforming them)? The alternative would be to store the new CAA records into the authz object and save them during the FinalizeAuthorization call.

I'd need to take a closer look but I believe we can just extend the existing CAA recheck RPC response to include the information we want back. We can also extend the FinalizeAuthorization RPC request to include this information and simply alter the SA method to also store that (I'd also not be strictly opposed to simply doing two RPCs here from RA -> SA, which would be a slightly cleaner boundary, but since we will only ever be doing this after a FinalizeAuthorization call I think it's fine to just mash them together).

Aug 14 '18 18:08 rolandshoemaker