smarter-encryption
smarter-encryption copied to clipboard
Use timestamp in aggregate table
No longer need the crawl ID since we are aggregating directly. We usually need to decode it into a timestamp anyhow.
We will alter table and restart the service(s). Shouldn't need an additional PR.
Updated CONTRIBUTING.
My point was that some internal code still works with the removed field. If you drop https_crawl_aggregate.max_https_crawl_id
, won't that code break? For example:
https://dub.duckduckgo.com/duckduckgo/ddg/blob/bttf/components/https/create-https-queue.pl#L75
That specific script is on deck for an update beyond the ID. We can run it prior to merging the internal PR to give us a little leeway but, if necessary, the queue can be populated manually.
It sounds like it's expected that this change makes the schema incompatible with the internal scripts like create-https-queue.pl
. And you're trying to rush this change out to the public repo so that anyone else who forks the repo will get the updated scheme. Am I reading that right?
I get the motivation. It would be nice if the code was in a working state, at the very least because it makes it tougher for me to review. Would you like help cleaning it up? In particular, I'm thinking things like:
- Get rid of the duplicate https_crawl.pl in components/https
- Make all the scripts work with the new schema
- Get rid of any unused scripts
@aaronharsh Let me see if I can clear up the confusion.
- The code is in a working state. You can test this on a local Postgresql instance and it works fine.
- It doesn't affect anything internal yet. The integration PR has the submodule pointed at a previous commit with crawl ID still there.
- Once this is merged, I will open another PR internally to address any legacy references to crawl ID, bump the submodule commit, and we can verify it end-to-end internally.