dproofreaders icon indicating copy to clipboard operation
dproofreaders copied to clipboard

Prune user_project_info upon project archive

Open cpeel opened this issue 4 years ago • 1 comments

user_project_info maps some high-level user actions to a project and powers things like the My Projects page and project notifications. It also has an interesting feature where when users visits project pages their last visit time for the project is stored in this table as well. This table is not currently archived, grows unbounded, and is the 3rd largest table on pgdp.net (after page_events and past_tallies). We should prune this table upon project archive (with certain restrictions) and prevent new rows from being created for archived projects.

Addressing this will include:

  • Updating pinc/archive.inc to deleted user_project_info rows for projects being archived where user_project_info.t_latest_page_event == 0
  • Updating projects.php to not call upi_set_t_latest_home_visit() if the project is archived.
  • Creating an upgrade script to delete the necessary rows for projects that have already been archived.

When archiving, it is import not to simply delete all project-related rows in the table as doing so will break My Projects. Instead, we should delete all project-related rows where user_project_info.t_latest_page_event == 0. This will remove rows for a project except those where the user has saved a page. Note that it will also remove records of users who have subscribed to project events, but after a project is archived that data is irrelevant.

cpeel avatar Apr 25 '21 19:04 cpeel

I'm somewhat uncomfortable with completely deleting the rows, because there are rare occasions on which archived projects are unarchived. I definitely agree that it would be good to be able to trim the user_project_info table, though, so I could be convinced.

srjfoo avatar Apr 25 '21 19:04 srjfoo