dataall icon indicating copy to clipboard operation
dataall copied to clipboard

Enable Data Producers to Set Expiration Periods for Shares:

Open anushka-singh opened this issue 11 months ago • 16 comments

Is your feature request related to a problem? Please describe.

  • Shares lack expiration dates, leading to potential issues with stale or outdated data.
  • There is no systematic process for removing invalid or outdated shares from the data pool.
  • Lack of auditing mechanisms makes it challenging to track share validity and usage.
  • Stale shares can clutter the system and may lead to misuse or confusion.
  • There's a need for shares to have time-bound validity, with both producers and consumers reviewing them periodically.

Describe the solution you'd like

1. Granular Expiration Settings:

  • Producers can set expiration periods at:

    1.1 - Dataset level: Define an expiration date range (e.g., 30-90 days) when creating a dataset. 1.2 - Share level: Specify different expiration periods for individual shares based on requestor needs (e.g., 30 days for R1, 90 days for R2).

2. Notification System:

  • As expiration approaches (e.g., two weeks prior), the system sends notifications to requestors and/or producers to reassess the share's validity.

3. Handling Expiration:

  • Upon reaching expiration: 3.1 - Escalation: Option to escalate to a higher authority in the chain for further review or action. 3.2 - Revocation: Ability to revoke access to the share, potentially disrupting downstream processes dependent on the dataset. 3.3 - Visual Indicators: Display a prominent indicator (e.g., red button) signaling the share's expired status without immediately revoking access, preventing disruptions to existing processes.

4. Configurability:

  • All time ranges are configurable, allowing flexibility to adapt to different use cases and requirements.

Describe alternatives you've considered

  • I will write a longer description in following days proposing different solutions and discuss with team
  • All time ranges are up for discussion and will be configurable

P.S. Please Don't attach files. Add code snippets directly in the message body instead.

anushka-singh avatar Mar 01 '24 15:03 anushka-singh

Solution 1:

GH1083

Please look at attached diagram for more context on what A1, A2, B1,B2 mean:

At what granularity is expiration defined?

A1: Expiration is defined at dataset level: producer sets an expiration (for example 3 months) for the dataset. When share is granted, each share on that dataset will by default have 3 months expiration. This feature will be a part of dataset import/creation modal. A2: Expiration is defined at share level: producer sets an expiration (for example 3 months) for the dataset when share is requested. This feature can be a part of share UI.

When should notifications be sent out that expiry is approaching? And how frequently?

  • 2 weeks before share expires?
  • Should they be informed more than once in the 2 weeks?

Who should be informed that expiry is being reached?

B1: Producer - Should we inform DATA OWNERS since they are responsible for the governance of their datasets - more so than requestors of datasets? B2: Consumer - Owner might not know if consumer needs access still. Maybe we can have a flow like: requester gets notified and has a choice -> revoke request OR resubmit request for approval -> owner now needs to approve request B3: Inform both producer and consumer that share is reaching its expiry

If access is renewed in timely manner and expiration not reached - Do nothing

What happens when expiration is reached since access was not renewed in timely manner?

C1: Escalate to skip level: Option to escalate to a higher authority in the chain for further review or action. C2: Revoke access to dataset: Ability to revoke access to the share, potentially disrupting downstream processes dependent on the dataset. C3: Show visual indicator on UI: Display a prominent indicator (e.g., red button) signaling the share's expired status without immediately revoking access, preventing disruptions to existing processes.

anushka-singh avatar Mar 05 '24 17:03 anushka-singh

Solution 2: Priority setting based system

GH1083-2

Please look at attached diagram for more context on what A1, A2, B1,B2 mean:

What is the priority based system?

  • When user creates a share request, they can select one of 3 set priorities for their request. A1. Highest priority: Priority 1 - access never expires. This is a critical dataset needed for a long term process that will never expire. User needs to add reasoning for choosing this priority that will be reviewed by producers. Producer might accept or downgrade request. A2. Priority 2 - long term priority ~ (maybe 3 to 6 months?). If a user selects this option from dropdown while creating share request, they will be able to apply for a validity of 3 or 6 months A3. Priority 3 - short expiration - Users may need this for some temporary set up or experimentation . Can last about ~ 30 days and should be granted easily by producer.

If Priority 1 - share never expires and access always remains If Priority 2 or 3 - we continue with the workflow as shown in diagram and described below:

When should notifications be sent out that expiry is approaching? And how frequently?

2 weeks before share expires? Should they be informed more than once in the 2 weeks?

Who should be informed that expiry is being reached?

B1: Producer - Should we inform DATA OWNERS since they are responsible for the governance of their datasets - more so than requestors of datasets? B2: Consumer - Owner might not know if consumer needs access still. Maybe we can have a flow like: requester gets notified and has a choice -> revoke request OR resubmit request for approval -> owner now needs to approve request B3: Inform both producer and consumer that share is reaching its expiry I believe in this method, it might make sense to inform the consumer of dataset ONLY since they are the ones that requested a certain priority..

If access is renewed in timely manner and expiration not reached - Do nothing

What happens when expiration is reached since access was not renewed in timely manner?

C1: Escalate to skip level: Option to escalate to a higher authority in the chain for further review or action. C2: Revoke access to dataset: Ability to revoke access to the share, potentially disrupting downstream processes dependent on the dataset. C3: Show visual indicator on UI: Display a prominent indicator (e.g., red button) signaling the share's expired status without immediately revoking access, preventing disruptions to existing processes.

Now in this section, we can choose to handle expiration based on priority levels. For example: If it was a priority 3 level share request, we can go with C2 option since it might not have a real impact on any downstream systems. OR if it was a priority 2 level share request, we can have a C3 + C1 level combined approach - where we show a visual indicator with a grace period to renew, and if requestor still does not renew in grace time, we can escalate to the next level.

anushka-singh avatar Mar 05 '24 20:03 anushka-singh

Questions still remaining to be answered:

  1. Finalize on all numbers and configurations.
  • [ ] When should notification be sent out?
  • [ ] What are the expiration option periods? 1 month/3month/forever OR something else?
  1. Should there be a separate RDS table in which to store the expirations? Or add a new column in dataset table or share table?
  2. Which solution should we go with?
  • [x] Solution 1
  • [ ] Solution 2
  • [ ] Something else
  1. In solution 1 - producer sets expiration; should they be allowed to change expiration via data all UI? For example, if I created dataset with 3 month expiration, I later realized I want it to be 1 month expiration instead, can I then change it via data all UI?
  2. In solution 2 - consumer sets expiration; I created share request with option of priority 2 but immediately after realized I needed priority 3 level access. Should I be allowed to change my access request?

anushka-singh avatar Mar 05 '24 20:03 anushka-singh

I like solution 1 as it allows to set granular expiration of shares. In this design, maybe we can sent out notifications 1 month before the share expiration and then 1 week before and then 1 day before . Once the share is expired then revoke shares for dataset which have a higher level of confidentiality ( ? ). Another option would be to specify what actions does the dataset owner want to take when the share expires., If the dataset owner chooses to revoke the share then revoke at expiry or if the owner chooses notify then notify or both. I like the UI indicators part. We can add a filter on the share list view to filter these shares which have expired.

Once the share is about the expire, I like the option in which the consumer can request an extension on the share and then the producer approves the extension. And I think informing ( email notifications ) both producer and consumer about the share expiry would be prudent, just the way we send email notifications both to the producer and the consumers on a share request.

Q. In this solution though, should we also allow the producer to modify the expiration period while approving the share / after approving the share?

If we go with option 2 , to determine the priority , maybe could use confidentiality of dataset to determine priority ?

As for the questions posed by you in the comments

When should notification be sent out? -> 1 month before the share expires, then 1 week , 2 days , 1 day before What are the expiration option periods? 1 month/3month/forever OR something else? -> If its not a lot of dev effort, a custom expiration would be best Should there be a separate RDS table in which to store the expirations? Or add a new column in dataset table or share table? --> I think we can extend the dataset table and add more fields for this feature. Which solution should we go with? -> I like solution 1 as it gives the consumer and producer to have customized expiration limits as per their requirement

Few question(s)

  1. How do we trigger actions once the share is expired ? Will there be an event stored in SQS which is triggered on expiration time ? Or will there be polling to check if shares are expired ?

TejasRGitHub avatar Mar 06 '24 19:03 TejasRGitHub

Final Design Review

  • Adding 2 phases. Phase 1 will have major functionalities. Phase 2 will have nice-to-haves that can be implemented later as an add-on.
  • For better understanding, follow each dotted rectangle in the below diagram with description in each section. They will be correlated:

Phase 1:

Design overview:

Design is divided into 2 parts:

  1. Create and approve share request
  2. Reaching expiration of shares

expirationsharesuserflow drawio (2)

Part 1: Create and approve share request

Advanced controls tab:

  • Note: in our last meeting, we discussed having a calendar like popup in adv control where user could select specific dates until when they wanted to give max access to datasets. For example, I could give a random date like 5/6/2024. During an internal discussion, we realized we instead want to give the option to set cadence for renewal on a monthly or quarterly basis. The main reason for that was, if shares expire at random times over the year, for the producer to extend validity, will be cumbersome. They would get random requests for extension at no specific cadence and might forget to provide access in a timely manner. Hence, the design going ahead assumes that shares can be requested such that they will always expire at end of month/quarter. Renewal notifications will be sent to producers only at end of month/quarter based on what the producer decides in adv controls.

  • Producer creates/imports dataset and specifies controls in advanced control tab.

  • Can specify the following:

  1. Expiry and approval cadence - drop down between monthly and quarterly
  2. If they select monthly, then: next fields will say:
    • Minimum validity of expiry in months - this will be a field to fill a number in
    • Maximum validity of expiry in months - this will be a field to fill a number in
  3. If they select quarterly, then: next fields will say:
    • Minimum validity of expiry in quarters - this will be a field to fill a number in
    • Maximum validity of expiry in quarters - this will be a field to fill a number in
  4. Enable auto approval of shares - toggle button

Frontend mockups: Screenshot 2024-04-12 at 10 21 35 AM

Screenshot 2024-04-12 at 10 22 05 AM Screenshot 2024-04-12 at 10 22 19 AM
  • Adv controls stored in Dataset table in RDS. Example table entries shown in main diagram above.

Consumer creates share request

Frontend mockup: Screenshot 2024-04-12 at 10 37 44 AM

  • On requestAccessModal, there will be a new compulsory field - Access Period
  • Max time a user can request access, will depend on the max time set by producer
  • That max time will be pulled from RDS and will be used to add constraints to the field
  • This field will be populated based on overall expiration for dataset set by producers. For example, if as a producer set max access to 6 months, on this UI, only 1 to 6 will be able to be selected.
  • If a producer sets max expiration as 8 months, consumer wont be able to request access beyond that

Auto - approval will work as is currently

Share is granted by producer

  • Once share is granted, validity is stored in share object table in RDS
  • 3 new columns will be added - Validity, Submitted for renewal (y/n), reason for rejection of extension
  • Validity - At a time, share can only be created for a max of this time. For example, 6 months or 4 quarters. Once this time is up, I need to create a request for extension
  • Submitted for renewal: will get updated later in the design doc
  • extensionRejectReason - if producer wants to specify why extension was not allowed

Part 2: Reaching expiration of shares

Changes on shareView page

Changes on share view page:

  1. Extend validity share button: User can click on this button at any time to extend validity of their share.
  2. If they click on this button, a pop-up modal will show up where they can specify extension - more on this later
  3. Share expiring in column - keeps updating like a timer?
Screenshot 2024-04-12 at 10 53 24 AM

Handle expired share

  • This is when extension was not requested and share expired
  • Shares can only expire at end of month/quarter
  • monthly ECS task
    1. Query share table to get expired shares from share table i.e. shares that have "share submitted for renewal" as "FALSE" and "validity" column as "0"
    2. Call revoke_share ecs task on these shares
  • Notify requestors that share has expired via email/UI notifications using current systems

Mechanism to request extension

  • Consumer comes to share view UI
  • Clicks on extend share validity
  • Pop up modal like when verifying share or re-applying share pops up
  • Simple dropdown where user can select months/quarters they are requesting for validity to be extended
  • Dropdown again populated from dataset table in rds that stores max validity that can be requested
  • Hit submit
  • On button click, update share table with new expiry and mark submit for renewal to "TRUE".

At end of each month

  • Monthly ecs task -
    • finds out all shares up for a renewal by querying share table and finding shares marked as submit for renewal to "TRUE"
    • Updates shareView UI with 2 new buttons - approve share extension and reject share extension. Check mock up.
  • Producer gets an email with a list of share links
  • They go to share view UI and click on either button:
Screenshot 2024-04-12 at 11 33 28 AM
  • Producer can either reject or approve extension. Either way, share table in rds will be updated with marking "submitted for renewal" as "False" and expiry set to 0 if rejected and new expiry if approved. Producer can also specify a reason for rejection that can be displayed on the UI or sent via notification to requestor.

  • Notify requestor about approval or rejection via email.

Phase 2:

  1. Let requestors and producers change access period after the fact. For example, I set expiration of dataset to 6 months, but later I want to change it to 3 months. There should be a way to handle that.
  2. Implement nagging emails to requestors to extend validity in a timely manner.

anushka-singh avatar Apr 12 '24 15:04 anushka-singh

  • I think advanced control button is a bit unintuitive. I would put a panel at the bottom of the form which is by default closed-up and you can extend it there. I'd put it under classification panel and move auto approvals into it.

  • What would happen if a share extension is rejected? What will the requester see? Where will you show the rejection reason? We need a mockup I think? Will we allow the requester to submit for re-extension? I think we should.

  • I find the naming "minimum validity" and "maximum validity" not very self explanatory or just a bit weird. The share doesn't become valid/invalid - it expires. On the dataset UI I would perhaps change "Expirity validity (months)" to match the access modal to "Access period (months)" Minimum (months) and Maximum (months). Inside the db I'd call these fields expiry_min_duration expiry_max_duration..

  • Rather than having a button "Extend share validity" change it to "Request access extension" or "Request extension"

  • I think you need a field to track when the share extension was granted so you can calculate when it will expire. What will be the exact steps to calculate what exact time the share expires? You need some kind of starting date I believe.

zsaltys avatar Apr 15 '24 10:04 zsaltys

I think advanced control button is a bit unintuitive. I would put a panel at the bottom of the form which is by default closed-up and you can extend it there. I'd put it under classification panel and move auto approvals into it.

We decided on adv control button in the last design meeting. I am okay with the panel at the bottom too. Curious to know what others think - @dlpzx @noah-paige

What would happen if a share extension is rejected? What will the requester see? Where will you show the rejection reason? We need a mockup I think? Will we allow the requester to submit for re-extension? I think we should.

For initial design, if extension is rejected, we can notify requestor via email that request has been denied. If its not significant efforts, I will implement a way similar to how a share request is currently denied. Producer will be able to put up a reason for rejection after clicking "Deny" and the reason can show up in the UI in Share Object Comments section. Yes, users should be able to submit for re-extension. Not sure how to achieve this though. One way would be for them to delete the request and create a new one - but this might mean they would need to create a new share request from scratch.

I find the naming "minimum validity" and "maximum validity" not very self explanatory or just a bit weird. The share doesn't become valid/invalid - it expires. On the dataset UI I would perhaps change "Expirity validity (months)" to match the access modal to "Access period (months)" Minimum (months) and Maximum (months). Inside the db I'd call these fields expiry_min_duration expiry_max_duration..

Yes, I hadnt given much thought to the naming of fields yet. But this is helpful! Will take these suggestions into consideration.

Rather than having a button "Extend share validity" change it to "Request access extension" or "Request extension"

Agreed!

I think you need a field to track when the share extension was granted so you can calculate when it will expire. What will be the exact steps to calculate what exact time the share expires? You need some kind of starting date I believe.

Yes, I thought about this during the design. One way I could think of is, when the monthly ecs task runs, we can keep subtracting the expiry_validity (months) column of a share by 1 - unless a share is extended already. In that case, we might not need a set date at all, since the next time when this column becomes a 0 we can tell that the share has expired. Basically, sending shares for extension and keeping validity of shares fixed to end of a month, simplifies things for us and we can use it to our advantage in these calculations. I want to give this a try during development. If it doesnt work we can resort to the calculation of set dates.

anushka-singh avatar Apr 15 '24 14:04 anushka-singh

@anushka-singh I'd just keep a field when the share was extended last.. when the share is created the date becomes the same as share approval date.

zsaltys avatar Apr 15 '24 14:04 zsaltys

Hi @anushka-singh , I like the design proposal. Very thorough and detailed.

I just have one comment. Enable Auto approval is currently present in data.all on the main UI ( on dataset creation page ) . And I also saw the same thing in the "Advanced Controls" section. I think we can we can just have it placed either inside OR outside on the dataset creation page.

Few questions -

  1. Can the dataset owner be able to edit the dataset to change the validity / expiration periods ? If yes, will all the shares get updates ?
  2. (Nice to have ) Instead of sending emails for each share to the dataset owners, can we send a consolidated email to the dataset owner about all the shares ? This will address the problem in which the owners will get a lot of emails.

TejasRGitHub avatar Apr 15 '24 16:04 TejasRGitHub

I just have one comment. Enable Auto approval is currently present in data.all on the main UI ( on dataset creation page ) . And I also saw the same thing in the "Advanced Controls" section. I think we can we can just have it placed either inside OR outside on the dataset creation page.

  • Yes, we will have it only in one place. I will move it under adv controls or the expandable pane Zi suggested.

Few questions -

  1. Can the dataset owner be able to edit the dataset to change the validity / expiration periods ? If yes, will all the shares get updates ?
  • This can be addressed in phase 2. I have mentioned it in Phase 2 section of the design already. Once we have agreement, I will file github issues for phase 2 tasks.
  1. (Nice to have ) Instead of sending emails for each share to the dataset owners, can we send a consolidated email to the dataset owner about all the shares ? This will address the problem in which the owners will get a lot of emails.
  • Yes, thats exactly the plan. Producer will get just one email which includes all shares up for extension.

anushka-singh avatar Apr 15 '24 16:04 anushka-singh

Hi @anushka-singh thanks for the diagrams and mock-ups it was very easy to follow (I drafted my comments before looking at the feedback of others to have an unbiased opinion)

  1. My first comment was the same as you already commented. I would like a dropdown pannel in the same window in the dataset creation view with the advanced configuration params for sharing.
  2. There is an edge case that I was a bit concerned about: what happens when a requestor opens a share request almost at the end of a month/quarter? I think we should let them know in the share request "it will expire in Xdays" with whatever they introduce. For that I think it is useful to store the extensionRequestedDate as Zil was also mentioning
  3. Since the expiration date affects all share items equally I would not plot it in the sharedItems table, because it should be the same for all. It is a ShareObject field, so I would represent it with the shareObject metadata
  4. Let's say we reached the expiration date without requesting a extension. Now all our shareItems in a share have been revoked. What happens now? Can I re-request all items? How do I set the new expiration date? The same applies when we revoke all items of a share and re-request all of them, if we do not delete the share request we cannot request a new expiration date without doing an extension request.
  5. It is not explicitly written in the design, but I assume that if a producer rejects the extension, then the items get revoked right? It is not exactly like the rejection of a share, because in the typical rejection nothing is shared yet.
  6. Should we also introduce an extensionApproveReason?

dlpzx avatar Apr 16 '24 11:04 dlpzx

Hi @anushka-singh thanks for the diagrams and mock-ups it was very easy to follow (I drafted my comments before looking at the feedback of others to have an unbiased opinion)

  1. My first comment was the same as you already commented. I would like a dropdown pannel in the same window in the dataset creation view with the advanced configuration params for sharing.

Sounds good!

  1. There is an edge case that I was a bit concerned about: what happens when a requestor opens a share request almost at the end of a month/quarter? I think we should let them know in the share request "it will expire in Xdays" with whatever they introduce. For that I think it is useful to store the extensionRequestedDate as Zil was also mentioning

Yes, fair point. We can let them know "it will expire in Xdays" in share request window.

  1. Since the expiration date affects all share items equally I would not plot it in the sharedItems table, because it should be the same for all. It is a ShareObject field, so I would represent it with the shareObject metadata

Yes, I was looking for a better place for this too. I can put it in the share object metadata.

  1. Let's say we reached the expiration date without requesting a extension. Now all our shareItems in a share have been revoked. What happens now? Can I re-request all items? How do I set the new expiration date? The same applies when we revoke all items of a share and re-request all of them, if we do not delete the share request we cannot request a new expiration date without doing an extension request.

We can make it work same way like revoke works. If a user did not extend request, share object gets revoked. Then they will have to delete the share and create new one. Do you have a better way of handling this or is this ok?

  1. It is not explicitly written in the design, but I assume that if a producer rejects the extension, then the items get revoked right? It is not exactly like the rejection of a share, because in the typical rejection nothing is shared yet.

Yes, if rejected, items will get revoked at the time of expiration. If I sent a request to extend before expiration cycle and extension is rejected, share will still be valid until expiration date. Does that make sense?

  1. Should we also introduce an extensionApproveReason?

Sure we can do that.

anushka-singh avatar Apr 16 '24 22:04 anushka-singh

@anushka-singh will you be picking this up again as part of v2.7 ?

anmolsgandhi avatar Jul 01 '24 19:07 anmolsgandhi

@anmolsgandhi Let me discuss with the team internally and get back to you.

anushka-singh avatar Jul 02 '24 15:07 anushka-singh

@anmolsgandhi We are working on discussing our internal priorities at the moment. We should have an update by next week.

anushka-singh avatar Jul 03 '24 21:07 anushka-singh

@TejasRGitHub will build on my work on this for 2.7, while I focus on other tasks. The design is ready and some frontend work has already been done. He will continue to build on that.

anushka-singh avatar Jul 09 '24 16:07 anushka-singh

UI views :

  1. Dataset Creation Form image When not expanded image

  2. Dataset Overview -> Governance & Classification image

3.. Dataset Edit form show the same view image

  1. Request modal image

  2. Create Share Extension image

  3. Share View Page when requested for Extension image

Modifications made to the proposed design mentioned here - https://github.com/data-dot-all/dataall/issues/1083#issuecomment-2052006984

  1. Request a non-expiring share There are scenarios in which a prod user role might require access to the dataset infinitely long and should never expire. In this case, a user can request a non-expiring share on a dataset which has expiration enabled.

Share Request Modal for accessing Non-expiring shares : image

Share View when the Request is Approved : image

  1. Configurable ECS Task scheduling for sending reminder emails and revoking an expired share Instead of triggering an ECS task at the end of the month, making it configurable which will trigger ECS tasks 'x' number of days before the end of the month. The idea is the trigger multiple ECS tasks at different dates in a month to inform the user about the share expiration .
 "share_expiration" : {
                    "active" : true,
                    "run_schedule" : [1, 3, 7]
                }

Refering the config above, now the share-expiration ECS task will get triggered 1, 3, 7 days before the end of the month

TejasRGitHub avatar Aug 01 '24 17:08 TejasRGitHub

Close as completed. To be released in 2.7

dlpzx avatar Sep 05 '24 12:09 dlpzx