Database export improvements
This is an umbrella issue to track all ideas relating to the database export
- [x] Permalink (#60)
- [ ] Currently, the db export takes quite a while. This post contains some ideas from @larspetrus for improving it.
- [ ] We've received reports that the TSV export has grown so large that it no longer fits into MS Excel. We don't necessarily have to do anything about this (there are 3rd party pieces of software to split TSV/CSV files into multiple files), but some people might really appreciate this.
- [ ] We currently have no way of knowing who uses the database export, and we have no way of contacting those people. To remedy this in the future, we could require that people use an OAuth application to download the export. (We could even get fancy and let them specify a webhook to be notified when database exports occur). In the mean time, this is Tim's suggestion for contacting software peeps:
Another idea would be to create a wca-software-announce mailing list, which people can subscribe to through google groups. That might have the positive side effect of exposing some of the things that you're working on to people who could theoretically be interested in working on them. Then you can advertise that list by posting on speedsolving/wca site/delegates list "we have this backwards-incompatible change coming up, get ready, oh and by the way we'll announce other stuff to this list". Then ask the delegates to forward to their local software people.
I imagine we'll have a 6 month - 1 year period of time where we continue to support the old database export while telling people to switch to the new one. NOTE: We'll have to update the WCA workbook assistant before we disable the old export.
The link to my ideas goes to a blank page saying There is no group named “wca-software.
Ah, shoot, we renamed the group from wca-software to wca-software-public. I've updated the original link to reflect this.
The first bullet point I've ticked as that's been taken care of. Will close this issue unless people feel the other points are still relevant.
- We worked around the db export slowness by getting rid of the human element. Instead, the db export is generated in a cron job that just gets slower every week, and we don't really notice it. We'll probably need to revisit that someday, but :man_shrugging: on when we should do that.
- I don't feel it's our job to work around limitations in excel.
- I think we should eventually semver our public export. CC-ing @lgarron because we've talked about this recently.
I don't think it's worth tracking 1) or 2). It might be worth creating a new issue to discuss 3).
Is this still relevant?
Chiming in from a DPO perspective on 3). For the right to be forgotten, its is WCA's duty to request the data removal from known data processors. We typically just write-off the public database export as unknown data processors to where would could not reasonably seek out and remove the PII from everyone that's downloaded the export. Tracking downloads would very much help this if we wanted to include that on our radar, which is probably a good idea.
We typically just write-off the public database export as unknown data processors to where would could not reasonably seek out and remove the PII from everyone that's downloaded the export.
I definitely agree that (even if we had perfect download tracking) chasing everybody who ever downloaded the export is not at all reasonable or feasible. What would be the advantage in implementing tracking, which you said "is probably a good idea"?
We typically just write-off the public database export as unknown data processors to where would could not reasonably seek out and remove the PII from everyone that's downloaded the export.
I definitely agree that (even if we had perfect download tracking) chasing everybody who ever downloaded the export is not at all reasonable or feasible. What would be the advantage in implementing tracking, which you said "is probably a good idea"?
It’s a good idea, if we wanted to pursue those data processors for deletion of personal data. But I’m not even sure that’s something we would want to do.
I'm not even sure it's something that we could do even if we desparately wanted. How would that look like in practice, anyways? Send a mail to everybody who ever downloaded the export, have them open the SQL file and delete a few statements?
I'm not even sure it's something that we could do even if we desparately wanted. How would that look like in practice, anyways? Send a mail to everybody who ever downloaded the export, have them open the SQL file and delete a few statements?
Ya or my thought would be, anytime we make an update like that, have soemthing trigger to send an email asking everyone to deleting the old database export and download a new one, but again probably not worth it considering the very large number of people who have access to the export
And also considering that just sending an email is not very... binding? Sorry, missing a good English word here, what I intend to say is that emails are easy to consciously ignore (or even unconsciously forget)
Right, the specific wording in GDPR iirc is: “inform known data processors”, so it doesn’t matter if the data processor does what we ask them to do or not, but if we were to even get into any legal trouble for someone saying our data processors didn’t remove their data, we’d have the emails to prove it.
Anyways like to say currently the people that download the database are unknown, so we don’t/can’t inform them to remove affect data.
It looks like the database export hasn't run since April 11th, 2023. Is this a known problem?
Closing this. Two remaining issues were:
- speed of database exports. This should be addressed as part of an upcoming (TM) Results rework - issue will be created when a need is demonstrated.
- tracking database exports. @Jambrose777 please feel free to create a separate issue if this is something you'd like us to put on the backlog for WRT