proposals icon indicating copy to clipboard operation
proposals copied to clipboard

Compliance-friendly workflow data retention

Open drewhoskins-stripe opened this issue 1 year ago • 5 comments

Author: Drew Hoskins

Summary of the feature being proposed

  • Can we retain data offset from the start of the workflow rather than its closure?
  • Can we have published guidance on how long the deletion takes so we can assess the compliance of retention policies?

Secondary helpful ideas:

  • Have per-workflow type retention policies rather than just per-namespace so that we don't have to create a separate namespace for each different retention policy?
  • Validate if workflow timeouts are longer than the namespace's retention policy as a sanity check.

What value does this feature bring to Temporal?

Compliance/regulatory regimes typically dictate data retention for sensitive information. One can either adhere to, or avoid being subject to, such regimes using data retention. For example,

  • Certain categories of Indian nationals' data cannot be exfiltrated from India and persisted for more than 24 hours. Because one can't have a retention policy of less than 1 day, it's currently impossible to exfiltrate Indian nationals' data in a compliant way and have it stored in Temporal metadata.
  • Per GDPR, data takedown requests for Personally-identifiable information (PII) must be processed within N days (and one can avoid needing to process takedown requests at all by having a retention policy that's within that limit). Supposing a 30 day limit: allowing a 30 day policy from workflow start would be more straightforward and understandable by users. It would also avoid games like "run the workflow for up to 3 weeks and then allow 9 days of retention." This isn't ideal: for example, when the workflow finishes instantly, it is only retained for 9 days when you'd rather retain it longer for debuggability.

For this to work, the retention policy should mean (and be documented to mean) that the data will be deleted by that point (assuming the server is up and operating normally) vs just being scheduled for later deletion when the retention window lapses.

Are you willing to implement this feature yourself?

Not sure. We don't have much experience editing temporal-server, but I wouldn't rule it out, given sufficient guidance from the core team.

drewhoskins-stripe avatar Mar 03 '23 01:03 drewhoskins-stripe

Hey @drewhoskins-stripe, thanks for the request/proposal. We will need some time to understand how this aligns with our plans/priorities but in the meantime I have a follow up question.

Re: retention which tracks its offset from the start of the Workflow, what is the expected behavior if that retention period ends while the Workflow is still Open? Right now, Retention is only a concept that exists for Closed Workflows. Would that result in forceful termination, eviction immediately when the Workflow organically closes or something else?

rylandg avatar Mar 09 '23 18:03 rylandg

Re: retention which tracks its offset from the start of the Workflow, what is the expected behavior if that retention period ends while the Workflow is still Open? Right now, Retention is only a concept that exists for Closed Workflows. Would that result in forceful termination, eviction immediately when the Workflow organically closes or something else?

Well, you could leave this behavior undefined by validating that no workflow timeouts are longer than the retention period; I can't think of a scenario where this would not be a bug on the user's part. But yeah, I suspect if you defined the behavior, you would need to forcefully terminate and then delete the data to be compliant with regulations.

drewhoskins-stripe avatar Mar 15 '23 18:03 drewhoskins-stripe

Ok this is useful input. I will work with the team to understand if and where this falls priority/timeline wise.

As a note, the default Workflow timeout is infinite and that's what majority of Temporal users use (and we recommend) so tying anything to that would be problematic.

rylandg avatar Mar 17 '23 19:03 rylandg

the default Workflow timeout is infinite and that's what majority of Temporal users use (and we recommend)

As you shift to take on more product use cases above the traditional infrastructure use cases, I suspect you'll find that this is not a tenable recommendation for many under common compliance regimes like GDPR.

drewhoskins-stripe avatar Mar 21 '23 17:03 drewhoskins-stripe

what do we envision here wrt failure modes. Let's assume for a minute that the deletion (in the happy case) is performed by a background processing routine of some sort, and on a given day that routine failed or was overloaded and didn't complete the action within the allotted time. Would some sort of reporting be required? Or is eventual consistency with the requirements considered "good enough"?

paulnpdev avatar Jun 14 '24 18:06 paulnpdev