dproofreaders icon indicating copy to clipboard operation
dproofreaders copied to clipboard

automodify logs no longer have the timezone in the filename

Open jmdyck opened this issue 9 months ago • 9 comments

When the automodify cron job became a BackgroundJob (Feb 2025), the timezone was dropped from the name of the log file that automodify creates. Mostly, this doesn't matter. However, when Daylight Saving Time ends in November, the server's time will switch from 3am EDT to 2am EST, and (if the filenames are still timezone-less), the files of the preceding hour will likely be overwritten by the files of the subsequent hour. (It's only "likely", not certain, because we include seconds in the filename, and there's a bit of wiggle there, so a collision isn't guaranteed.)

We should probably re-instate the timezone in the filename.

Other solutions (for completeness):

  • Switch to using UTC/GMT in the filename. (Though that might make it a bit harder for users to deal with the files.)
  • Switch to using Unix timestamp in the filename. (Even harder.)
  • Disable automodify for an hour around the EDT->EST transition.

jmdyck avatar Mar 22 '25 22:03 jmdyck

Note that, before the change to BackgroundJob, the filename was determined in the crontab file. The pre-BackgroundJob version of dp.cron.template doesn't show a timezone being included in the filename, so we must have (years ago) added the timezone in the server's crontab and not reflected that change back to dp.cron.template. (In case anyone looks and is puzzled.)

jmdyck avatar Mar 22 '25 22:03 jmdyck

What is the value of these log files to general users (compared to SAs who need to know the job ran successfully)? We go to a lot of effort to generate them, make them accessible to users, and keep them (archiving) and I'm not clear why.

cpeel avatar Mar 22 '25 22:03 cpeel

What is the value of these log files to general users (compared to SAs who need to know the job ran successfully)? We go to a lot of effort to generate them, make them accessible to users, and keep them (archiving) and I'm not clear why.

Fair point. I think I've only once looked at one of the older logs, and I think I decided that what I thought I wanted to get from it really wasn't useful. I have periodically used the most recent ones, but rarely more than a few days old.

srjfoo avatar Apr 20 '25 04:04 srjfoo

It finally occurred to me to look at the automodify logs for 2025-11-02 to see what happened (note that at 2am, the clock rolls back to 1am, not 3 to 2). Looking at the screenshot, 2 were overwritten, 2 were not.

Image

I still have no real idea if we should worry about the overwritten logs. In a pinch, if we really needed information about something that was logged in an overwritten file, we can do a certain amount of reconstruction based on project_events; perhaps not in as much detail, but hopefully enough to learn what we need to know.

Should retention of the archived automodify logs be aligned with the new data retention policy?

srjfoo avatar Nov 15 '25 06:11 srjfoo

I'm still trying to understand what value these actually have beyond "we've always created them and have always kept them".

cpeel avatar Nov 17 '25 00:11 cpeel

When someone asks why a project was unexpectedly released, I think the relevant automodify log is often the easiest way to answer the question. There are other questions it can answer, but I think that's the commonest. @srjfoo, is that what you've used them for?

They can also be useful as a trace, to confirm that automodify is operating as it should, after some change to the code or the queues. But that's probably rarer.

It might be more fruitful to ask about their value in pfs-and-squirrels.

jmdyck avatar Nov 17 '25 03:11 jmdyck

@jmdyck, yes, that's the commonest use, I think.

It looks like the logs go back to January of 2010. I thought that the Queue Busters might have used them, but I searched for "automodify" on about half the pages, and saw no mention of it.

Doing a quick search of the code (I'm far from an expert in using git blame), I found mention of setting up the directory for the logs from 18 years ago, but our archived logs only go back to 2010.

I hesitate to suggest getting rid of the logs altogether, because of the reasons @jmdyck mentioned, but I seriously doubt we need to keep them longer than a year. It wouldn't hurt to ask the PFs and squirrels if anyone other than the squirrels use them, for what, and how far back they look.

srjfoo avatar Nov 17 '25 04:11 srjfoo

One thing I wasn't sure of was when the automodify logs were first 'advertised' to PMs & general users. Looks like it was September 2019:

jmdyck avatar Nov 17 '25 15:11 jmdyck

Doing a quick search of the code (I'm far from an expert in using git blame), I found mention of setting up the directory for the logs from 18 years ago, but our archived logs only go back to 2010.

I've got dp-sa email saying we started saving automodify logs to the filesystem on 2003-10-04. Before that, I think automodify was generating log output, but it just went into cron-mail.

jmdyck avatar Nov 17 '25 15:11 jmdyck