notifications-api icon indicating copy to clipboard operation
notifications-api copied to clipboard

first draft of unsubscribe_request table

Open leohemsted opened this issue 1 year ago • 3 comments

Looking for comments on this please!

the table links to notification, service, templates_history (and templates). it has a batch id that links to - see chris's mockups for info.

notably, just using a view for the report. I'm not sure if we want to do this or not - it's got advantage that we normalise the data and avoid needing to keep different sources of data in check. the disadvantage is that if in the future we want to add data that we dont want to store on the individual unsubscribe_request

Viewing chris's slides about how we currently think we'll batch up unsubscribe requests into reports might be useful for understanding.

https://docs.google.com/presentation/d/1L9pTMQW-4d6z2LfOdBll7qZ3qXxC_gufk8q83zAM3nY/edit#slide=id.g2da06c6c272_0_35

leohemsted avatar May 21 '24 09:05 leohemsted

questions for the room:

  • is a view okay or should we use a table?
  • do we want to delete old records, or just remove the email address PII?
  • should email address be synchronously encrypted?

leohemsted avatar May 21 '24 11:05 leohemsted

A few random thoughts. I don't think any of these are deal breakers.

  1. I think I'd like some form of retention limit on the data in the table. If a service team didn't come and download and process their report for 2 years, I'd say they've left it too late and we shouldn't be obligated to keep this data for that long. I think we should pick less than 2 years, that was to pick an extreme. I'm not sure what the right number should be. 90 days? a data retention period set by the service of up to 90 days?

  2. Kind of the same point as above but to emphasise it separately. We only give services 7 days (by default) to download their inbound text messages or they will be lost. Why should this feature be any different?

  3. You mentioned we could make the email address nullable, so we could remove that data if it was processed/reached a retention limit. I'm cautious on the use of nullable. I think for this type of data our precedent is to go for a _history table which has the personal data removed and can be kept indefinitely. Is the deviation from this pattern worth it? Or another option you suggested is just delete all of the data - I would be fine with us doing that I think. I don't have a strong opinion on which the best approach is.

  4. The point in the meeting about who processed the unsubscriptions was interesting. I really liked the idea of adding an audit event for whenever someone downloads a CSV report out of notify with personal data in! However, whether we should use the events table to record who 'processed' the unsubscriptions - I'm not so sure about that. I think I see the purpose of the events table as a security audit table, that we rarely use and interogate. Maybe it also has some purpose for debugging? Maybe? But if we wanted to show something in the user interface to our users like 'Leo Hemsted processed these on Tuesday' then I'd probably encourage us to store this not in the events table as it feels less 'audit' and more 'feature'.

  5. Bigger idea and unlikely to be helpful given the time pressure... but if we managed subscribes and not just unsubscribes in Notify, this might be easier because we've just be removing an email from a subscription...

idavidmcdonald avatar May 21 '24 14:05 idavidmcdonald

  1. Yes, I agree. Something to pick up with ahmed and chris. I like 90 days as a starting point.
  2. Good point. I think the chance of inbound messages happening randomly without recent prompting is a bit lower in my mind, at least for services that dont have api integrations, but thats a pretty weakly held assumption and a fair challenge.
  3. fair point. we have an inbound_sms_history. i think for now i will make email non-nullable, and then build a separate history table in the same pattern
  4. interesting. i'll have a think about it.
  5. noooooooo

leohemsted avatar May 23 '24 13:05 leohemsted

My thoughts:

  1. DATA RETENTION: I think 90 days limit sounds good, and later moving the unsub requests to history table, so we have a record of them - for example if services forget to action any requests for a long time and then ask us how many they have missed.

I also think it would be good to notify our users that they have unactioned unsubscribe requests that sat there for a while - maybe if they don't action the requests for 30 days, and then again 7 days before deletion?

Otherwise it's a big risk they will not notice the requests.

  1. VIEW VS TABLE: you said "the disadvantage is that if in the future we want to add data that we dont want to store on the individual unsubscribe_request" - I think that's fair, but also in the future if we really need it, we could create a table for those reports and start using that. So I think it's fine to have a view for now, as it meets our needs, and move to a table later if we need it.

  2. should email address be synchronously encrypted: I think as long as we show our users unencrypted data (since this is a feature for teams with lower tech capability), it would be good to encrypt this PII in our database, and decrypt when we are displaying it.

CrystalPea avatar May 28 '24 15:05 CrystalPea

Pea's comments:

  1. 90 day retention - sounds good. i've got a history table now and we can handle how and when data moves across later
  2. view vs table - i ended up going for a table - i think it just makes it a bit easier to tweak. the main thing that bugged me about the view is that every single unsubscribe request has a "processed_at" timestamp. But really the "processed_at" timestamp belongs to a report itself, and so having a timestamp duplicated across all the rows felt a bit weird.
  3. synchronously encrypted - pretty sure i discussed this with david and we decided its fine to have it unencrypted. makes it easier to search for a specific email address too. we dont have anything in place for encryption yet anyway so that'd just make development longer

leohemsted avatar Jun 03 '24 13:06 leohemsted

the squawk failures are all of type:


/tmp/upgrade.sql:2:2: warning: prefer-timestamptz

  note: A timestamp field without a timezone can lead to data loss, depending on your database session timezone.
  help: Use timestamptz instead of timestamp for your column type.

think this is fine as it's in line with all of the rest of our timestamps. pending a larger look at our timestamps cc @CrystalPea


/tmp/upgrade.sql:15:2: warning: prefer-bigint-over-int

  note: Hitting the max 32 bit integer is possible and may break your application.
  help: Use 64bit integer values instead to prevent hitting this limit.

this is a template_version field so i dont want to change the type as it's a foreign key.

we may want to in time look at how to filter these out

leohemsted avatar Jun 06 '24 09:06 leohemsted