altinn-studio icon indicating copy to clipboard operation
altinn-studio copied to clipboard

Analyse options for sharing data across instances and applications

Open acn-sbuad opened this issue 4 years ago • 9 comments

Description

Sometimes the same file attachments are relevant across multiple instances and users. For now the attachment is closely bonded to an instance, thus leading to a need to upload identical files for each time the attachment is relevant.

The service owner should upload a file that is possible to reference across multiple instances and users. A specific use case:

  • DiBK uploads a nabovarsel (multiple PDFs - potentially large files)
  • They create instances for the recipients of the nabovarsel
  • They reference the nabovarsel files to the different instances (without having to upload identical files multiple times)
  • Shared file should not be deleted during cleanup

In scope

  • Where do we store the files & how to structure
  • Where do we store the metadata about the files
  • What would separating this logic into a new platform component look like?
  • What process is required for an app owner to store the file? (authorization)
  • Which model modifications are required?
  • What is the process for linking a shared file to an instance?
  • What is the process for unlinking a shared file from an instance?
  • How to retrieve a shared file as an end user? (+authoriazation)
  • How to ensure that shared files are not deleted during cleanup
  • How to handle shared files in localtest.
  • How can we ensure that a file is not referenced by any instances? Should it be possible for an application owner to delete these files and should we be responsible for ensuring that this doesn't affect any active/archived instances?

Considerations

  • [ ] Is it okay for app owners that we don't have any authorization on reading the data. I.e. anyone with a link can access the file
    • Authorization and limiting access to the resources is desired.
  • [ ] How will a solution look if we use a seperate platform component for storage?

Out of scope

What's out of scope for this analysis?

Constraints

Constraints or requirements (technical or functional) that affect this analysis.

Analysis

Where to store the files

Alternative A

Within the application owner's storage account [org]altinn[env]strg01 and container used for appdata [org]-[env]-appsdata-blob-db a new section is created for shared files (fileShare or maybe there's a more suitable name without other connotations). Putting in a new section before adding folders for the categories to make it easier to tell the instanceData appart from the shared data. Also if a category matches an appId we would have an issue.

Alternative B

A new storage account is created for each application owner solely dedicated for shared files.

Blob container structure for the fileshare:

fileShare
    |-- category
    |   |   |-- dataGuid
    |   |   |   |-- dataBlob
    |   |   |   |-- fileInfo.json
    `-- category
    |   |   |-- dataGuid
    |   |   |   |-- dataBlob
    |   |   |   |-- fileInfo.json
    `-- nabovarsel
    |   |   |-- dataGuid
    |   |   |   |-- dataBlob
    |   |   |   |-- fileInfo.json

We need to have some metadata about the blob as well such as

  • Id / dataGuid
  • fileName
  • contentType
  • created
  • lastChanged
  • lastChangedBy (how do app owners feel about tracking the last changed)
  • blobStoragePath or category (- Could include a direct link only available for the app owner to access the element) this represented as the fileInfo.json object.

Where to store metadata about the files

Possible options here:

  • blob storage in the same folder as the blob itself
  • a new collection in CosmosDB partitioned on applicationOwner
  • table in postgresql

** Storing in storage account**

  • (+) can ensure that the data exist when linking it to an instance as we are already in the container to retrieve metadata.
  • (+) could experiment with blob index tags
  • (-) cannot easily query the blobs based on metadata
  • (-) blob index tags feature is in preview and not available in Norway yet

** Storing in Cosmos **

  • (+) possible to query files in the file share
  • (-) will require another collection
  • (-) we should probably verify that the blob exists before connecting the metadata to an instance. Would require an additional operation.

** PostgreSQL**

  • If we set up a new platform component for the fileshare we wouldn't have any previous bindings to storage affecting our decision, and the practicality of using PostgreSQL should be considered.

What would separating this logic into a new platform component look like?

Wrt. to performance and maintainability, introducing a new platform component rather than using Platform Storage wouldn't have any large effect, and the end-user will not know the difference.

A new platform component is introduced Platform Data / Platform Fileshare / Platform [insert descriptive component name]. The purpose of this component would be to expose endpoints for storing and managing data not directly related to an instance (i.e. not form data or attachments for a single instance).

The platform component would require a link to authentication (well known endpoint + redirect for missing auth) and authorization (PDP).

To make this platform component open for further extension we should spend some time figuring out how to create the link to the storage account in a generic was so that any storage account can be used in the future. For retrieving data the blob storage path should be helpful. When storing data we would need to determine the link to a storage account based on something else.

  • E.g. each controller is used to manage data in a specific type of storage account?
  • Information about the storage account must be included in the request?

My largest concerns about using a new platform component would be that we don't design it in a way that limits which future cases it could support.

What process is required for an app owner to store the file?

A new endpoint must be exposed in the platform component POST: %/api/v1/data/{org}/{category}

Authorization could entail matching orgClaim in claims principal to org in route, or introducing a new scope in maskinporten. If the categories should be possible to nest, I think the category parameter must be a query param in order to allow "/".

FileInfo is created based on metadata in the request and the blob is stored in the fileshare section of the app owner's storage account. This is a good time to implement a blobService that doesn't hold any logic. The job of composing the storage path should be extracted from the blobClient service.

Response contains the fileInfo JSON structure with

  • Id / dataGuid
  • fileName
  • contentType
  • created
  • lastChanged
  • lastChangedBy
  • blobStoragePath or category (- Could include a direct link only available for the app owner to access the element)

Managing and querying files in the file share

  • To delete a file in the fileShare DELETE request specifying category and guid or blobStoragePath
  • Get all categories returns a list of strings (loops through all folders in container)
  • Get metadata about all files (loops through and reads fileInfo.json for each blob)
  • Get metadata about a single file All operations would have to be available to the whole organization or we could include some soft of new scope.

How to link file to an instance

authorization on org Endpoint exposed through the application. HTTP Post / HTTP Put org/app/instances/{instanceId}/data/link? Query params (required a + b or c) a) category b) dataGuid c ) blobStoragePath

The suggested flow is as follows

  1. Retrieve fileInfo and ensure valid dataType is being linked to the instance.
  2. Check that upload doesn't break any constraints e.g. number of elements of the dataType at given task.
  3. Generate dataElement based on known info about the data with a link to the instance, and store in Platform Strage
  4. Return info to the client.

STEP 1 - Ensure valid data type

Should be handled by the application.

STEP 2 - Check if upload doesn't break constraints

Could be handled at this point before upload is attempted or during validation. As a user I would prefer being notified during upload, but if there are arguments to not stop the upload, this option should also be considered.

STEP 3 - Generate & store new dataElement

This responsibility lie with the app If in app: endpoint in storage for linking will take a dataElement as input. If in storage: endpoint in storage for linking will take fileInfo / metadata parameters as input.

STEP 4 - Return info to the client

What should be returned? The full instance or the newly created dataElement?

What is the process for unlinking a shared file from an instance?

HTTP Delete org/app/instances/{instanceId}/data/link? Query params (required a + b or c) a) category b) dataGuid c ) blobStoragePath

Deletes dataElement from cosmos, but nothing else.

How to retrieve file as an enduser

Existing Get method in platform component is used. Org, app, instance, dataGuid as input. Authorization: if access to read instance & shared blob is linked to the instance, user is allowed to read the shared data.

How to ensure that shared file is not deleted during cleanup

Check if filepath contains a key word, if so, do not delete blob, simply delete the dataElement from CosmosDb.

How to handle in localtest

Based on all suggestions a solution for localtest will be possible to support. Won't specify this at the current moment.

Conclusion

Short summary of the proposed solution.

Tasks

  • [ ] Is this issue labeled with a correct area label?
  • [ ] QA has been done

acn-sbuad avatar Nov 27 '20 09:11 acn-sbuad

A new analysis looking at the possibility to use a seperate platform component for the storage should be done.

acn-sbuad avatar Dec 08 '20 07:12 acn-sbuad

@altinnadmin @TheTechArch @SandGrainOne I would like some feedback on whether or not the solution should entail a new platform component or if Storage should be reused. For the remaining details, I think these can be decided on in the issues that cover the implementation.

acn-sbuad avatar Dec 20 '20 18:12 acn-sbuad

First: I believe at some time both options would be needed. We need together with the pilots figure what is needed now

Comments on the open-non-authorized solutions

  1. I believe it is ok to not protect against accidental deletion of files.
  2. I am not sure we need any additional metadata about blobs. We can set content-type directly on properties in blob storage that could be read I belive.l Link
  3. For the instance I belive there should not be any difference in a link to a document at nav.no or a document in this shared component. We maybe need a list of approved domains if we gonna support this.

TheTechArch avatar Jan 04 '21 08:01 TheTechArch

One idea I've had is that this could be used to test a new prosjekt structure. All our platform components are single projects. This creates very tight dependencies between different application layers. The idea is to adopt som of the ideas from the NorthwindTraders project. This is a project that is slowly becoming somewhat of a standard for new .Net core applications.

SandGrainOne avatar Jan 04 '21 08:01 SandGrainOne

This is a project that is slowly becoming somewhat of a standard for new .Net core applications.

There are some good stuff (test and src at root), and some bad stuff (uppercase, angular, old tech). We should not use NorthWind as a template, but some ideas might be ok to reuse.

altinnadmin avatar Jan 05 '21 14:01 altinnadmin

And perhaps we finally should extract our "perfect .net core app template" into a seperate repo? :)

altinnadmin avatar Jan 05 '21 14:01 altinnadmin

If authZ is an absolute requirement the functionality should be placed in Platform Storage. If authZ is not an absolute requirement the path of creating a new platform component will be preferred.

acn-sbuad avatar Jan 06 '21 08:01 acn-sbuad

app owner has decided to solve the issue by other means. Further analysis of this is therefore postponed. The team itself determines that a new platform component without authorization is the most desirable feature for the future as the use cases could be many.

acn-sbuad avatar Jan 25 '21 09:01 acn-sbuad

Ses i sammenheng med Formidlingstjenesten i Altinn 2 og eFormidling/DPO. "Drop-box for virksomheter". Kan da være en basis mikrotjeneste.

FinnurO avatar Jul 06 '22 20:07 FinnurO

@SandGrainOne can this issue be moved to one of the platform-repos?

nkylstad avatar Feb 28 '24 12:02 nkylstad

@nkylstad Sounds like the new Broker product could/should be used for this covering this case. //cc: @leogasnier

altinnadmin avatar Feb 28 '24 12:02 altinnadmin

We've discussed that maybe Storage should have the ability to store a blob independent of instance. Make it possible to link to the same blob from multiple data elements across multiple instances. I'm not sure if Broker is the right product for all cases where this could be usefull in an app.

SandGrainOne avatar Feb 28 '24 13:02 SandGrainOne

At least for the spesific use case in this issue, using one file in Broker + multiple elements in Dialogporten pointing to the same file, seems like the obvious solution.

altinnadmin avatar Feb 28 '24 13:02 altinnadmin

This case has already been discussed quite a bit both in relation to broker and correspondance (multicast) so both broker+dialogporten or broker+dialogporten+correspondance could be ways to solving this - The 1st alternative should be viable in mid-april from the broker side - @erikhag1 - feel free to join the discussion :) On general terms I believe it would be a good thing to be a bit puristic around what the different products in our portfolio does so that we dont enter a situation of cannibalisation/internal competition across the platform. But there is some unclear boundaries today that we should be aware of, so thanks @altinnadmin!

leogasnier avatar Feb 28 '24 14:02 leogasnier

I believe the current solution is using Correspondence. Attachments being copied into each. Broker would not fit, but maybe a new correspondence replacement.

SandGrainOne avatar Feb 28 '24 14:02 SandGrainOne

Thanks for your input! Seeing as nothing has been done here for over 2 years, and our theory is that either broker or new correspondence might cover the use case, I'm closing this issue. If anyone disagrees, feel free to re-open and transfer to whatever repo you deem relevant 😄

nkylstad avatar Apr 05 '24 11:04 nkylstad