smarter-encryption icon indicating copy to clipboard operation
smarter-encryption copied to clipboard

Use timestamp in aggregate table

Open zachthompson opened this issue 5 years ago • 5 comments

No longer need the crawl ID since we are aggregating directly. We usually need to decode it into a timestamp anyhow.

zachthompson avatar Nov 19 '19 20:11 zachthompson

We will alter table and restart the service(s). Shouldn't need an additional PR.

Updated CONTRIBUTING.

zachthompson avatar Nov 20 '19 23:11 zachthompson

My point was that some internal code still works with the removed field. If you drop https_crawl_aggregate.max_https_crawl_id, won't that code break? For example:

https://dub.duckduckgo.com/duckduckgo/ddg/blob/bttf/components/https/create-https-queue.pl#L75

aaronharsh avatar Nov 21 '19 01:11 aaronharsh

That specific script is on deck for an update beyond the ID. We can run it prior to merging the internal PR to give us a little leeway but, if necessary, the queue can be populated manually.

zachthompson avatar Nov 21 '19 15:11 zachthompson

It sounds like it's expected that this change makes the schema incompatible with the internal scripts like create-https-queue.pl. And you're trying to rush this change out to the public repo so that anyone else who forks the repo will get the updated scheme. Am I reading that right?

I get the motivation. It would be nice if the code was in a working state, at the very least because it makes it tougher for me to review. Would you like help cleaning it up? In particular, I'm thinking things like:

  • Get rid of the duplicate https_crawl.pl in components/https
  • Make all the scripts work with the new schema
  • Get rid of any unused scripts

aaronharsh avatar Nov 21 '19 17:11 aaronharsh

@aaronharsh Let me see if I can clear up the confusion.

  1. The code is in a working state. You can test this on a local Postgresql instance and it works fine.
  2. It doesn't affect anything internal yet. The integration PR has the submodule pointed at a previous commit with crawl ID still there.
  3. Once this is merged, I will open another PR internally to address any legacy references to crawl ID, bump the submodule commit, and we can verify it end-to-end internally.

zachthompson avatar Nov 25 '19 21:11 zachthompson