rgsoc-teams icon indicating copy to clipboard operation
rgsoc-teams copied to clipboard

Add rake task to strip down production database dump

Open carpodaster opened this issue 7 years ago • 6 comments

Why? We have a staging environment whose free database tier only supports up to 10,000 rows. We occasionally need to copy the current prod DB to staging in order to QA things.

We used to flush the Teams App's DB every year but since the introduction of the Season concept, we don't do that anymore. This is why the DB keeps growing.

What? The list below shows how many records each ActiveRecord model has (as of April 2nd).

  • The easiest way to solve this would be to delete data for old seasons, but ideally we also keep some old data
  • The current season should be left untouched
  • Accepted teams shouldn't be purged
  • Bonus: sanitize the data (user data, team names, application data, comments) so that we may use the DB dump as a developer's snapshot
ApplicationRecord
  .descendants
  .map { |klass| { klass.to_s => klass.count } }
  .sort_by(&:values)

# [{"Season"=>4}, {"ConferenceAttendance"=>22}, 
# {"Note"=>31}, {"Mailing"=>40}, {"Conference"=>103}, {"ConferencePreference"=>112}, 
# {"Maintainership"=>153}, {"Source"=>161}, {"Project"=>196}, {"Mentor::Comment"=>229}, 
# {"Application"=>542}, {"ApplicationDraft"=>699}, {"Team"=>782}, {"Todo"=>1158}, 
# {"Rating"=>2067}, {"Comment"=>2507}, {"Role"=>3150}, {"Submission"=>3187}, 
# {"Activity"=>3564}, {"User"=>3661}]

carpodaster avatar Apr 05 '18 07:04 carpodaster

Just a thought 🤔 : in regards of what just happened at Travis CI (and what previously also happened at Gitlab) we should probably prevent this from accidentally being run in the productionenvironment 😉

I'd therefore suggest to add a 4th environment: staging (don't know why we do not have it already)_ that is basically a clone of production and run the staging app in this environment.

klappradla avatar Apr 05 '18 16:04 klappradla

I'm not a fan of extra envs; it's also not advised by Heroku, but I'm also not totally against it :)

I've come across this Twitter thread that has some retrospective on AR meta data about the env its DB is configured for. We could test if that actually works.

Or we could abort if there is a DATABASE_URL set in dev or test envs.

Or we could just YOLO it, knowing that the potential impact isn't as bad; we have daily DB dumps, our app is not a high traffic one and restoring prod from the recent dev dump is a matter of 10 minutes, not several hours.

carpodaster avatar Apr 05 '18 18:04 carpodaster

You're probably right with your YOLO-point 👍

For the Twitter thread and the check on a DATABASE_URL env: I'm sorry, I did not make my point clear enough above 🙈 I'm not really concerned about running things from within the dev / test env against the production database. I was more referring to doing something like:

heroku run rails db:shove -a production

or:

heroku run rails db:shove

and it defaults to production.

Anyways, was just something that came to my mind when reading the issue. I think your YOLO-point is valid and making other things obsolete ✔️

klappradla avatar Apr 06 '18 17:04 klappradla

We have 552 users in the database without a github_id. We can probably delete those? I suppose they originate from adding a team member and using an invalid github_handle.

irb(main):003:0> User.where(github_id: nil).count => 552

For example: https://teams.railsgirlssummerofcode.org/community?utf8=%E2%9C%93&search=Vinh+Nguyen+Le&commit=Go

emcoding avatar May 24 '18 12:05 emcoding

@F3PiX good idea! Sounds like a good (late) spring cleaning

carpodaster avatar May 24 '18 12:05 carpodaster

I'll add it to the User Controller spring cleaning then 😂 . Working on it - that's where I stumbled upon it.

emcoding avatar May 24 '18 13:05 emcoding