proposals
proposals copied to clipboard
Compliance-friendly workflow data retention
Author: Drew Hoskins
Summary of the feature being proposed
- Can we retain data offset from the start of the workflow rather than its closure?
- Can we have published guidance on how long the deletion takes so we can assess the compliance of retention policies?
Secondary helpful ideas:
- Have per-workflow type retention policies rather than just per-namespace so that we don't have to create a separate namespace for each different retention policy?
- Validate if workflow timeouts are longer than the namespace's retention policy as a sanity check.
What value does this feature bring to Temporal?
Compliance/regulatory regimes typically dictate data retention for sensitive information. One can either adhere to, or avoid being subject to, such regimes using data retention. For example,
- Certain categories of Indian nationals' data cannot be exfiltrated from India and persisted for more than 24 hours. Because one can't have a retention policy of less than 1 day, it's currently impossible to exfiltrate Indian nationals' data in a compliant way and have it stored in Temporal metadata.
- Per GDPR, data takedown requests for Personally-identifiable information (PII) must be processed within N days (and one can avoid needing to process takedown requests at all by having a retention policy that's within that limit). Supposing a 30 day limit: allowing a 30 day policy from workflow start would be more straightforward and understandable by users. It would also avoid games like "run the workflow for up to 3 weeks and then allow 9 days of retention." This isn't ideal: for example, when the workflow finishes instantly, it is only retained for 9 days when you'd rather retain it longer for debuggability.
For this to work, the retention policy should mean (and be documented to mean) that the data will be deleted by that point (assuming the server is up and operating normally) vs just being scheduled for later deletion when the retention window lapses.
Are you willing to implement this feature yourself?
Not sure. We don't have much experience editing temporal-server, but I wouldn't rule it out, given sufficient guidance from the core team.
Hey @drewhoskins-stripe, thanks for the request/proposal. We will need some time to understand how this aligns with our plans/priorities but in the meantime I have a follow up question.
Re: retention which tracks its offset from the start of the Workflow, what is the expected behavior if that retention period ends while the Workflow is still Open? Right now, Retention is only a concept that exists for Closed Workflows. Would that result in forceful termination, eviction immediately when the Workflow organically closes or something else?
Re: retention which tracks its offset from the start of the Workflow, what is the expected behavior if that retention period ends while the Workflow is still Open? Right now, Retention is only a concept that exists for Closed Workflows. Would that result in forceful termination, eviction immediately when the Workflow organically closes or something else?
Well, you could leave this behavior undefined by validating that no workflow timeouts are longer than the retention period; I can't think of a scenario where this would not be a bug on the user's part. But yeah, I suspect if you defined the behavior, you would need to forcefully terminate and then delete the data to be compliant with regulations.
Ok this is useful input. I will work with the team to understand if and where this falls priority/timeline wise.
As a note, the default Workflow timeout is infinite and that's what majority of Temporal users use (and we recommend) so tying anything to that would be problematic.
the default Workflow timeout is infinite and that's what majority of Temporal users use (and we recommend)
As you shift to take on more product use cases above the traditional infrastructure use cases, I suspect you'll find that this is not a tenable recommendation for many under common compliance regimes like GDPR.
what do we envision here wrt failure modes. Let's assume for a minute that the deletion (in the happy case) is performed by a background processing routine of some sort, and on a given day that routine failed or was overloaded and didn't complete the action within the allotted time. Would some sort of reporting be required? Or is eventual consistency with the requirements considered "good enough"?