passwordless-server icon indicating copy to clipboard operation
passwordless-server copied to clipboard

PAS-148 | Data export

Open jonashendrickx opened this issue 11 months ago • 2 comments

Ticket

Description

We need to be able to export individual applications with all their relevant data, so customers can migrate to self-hosted instances.

Shape

Given applications with either a lot of users or enterprise applications with event logging enabled can end up having a lot of records.

Considerations:

  • We need to be able to enable paging easily at a later stage or possibly offload the data export process using a serverless approach.
  • CSV:
    • RFC-4180 compliant
    • Easy to enable paging later
    • CsvHelper has a lot of capable functionality included to help us achieve paging easily later.
    • UTF-8 encoded

Remarks:

  • Wasn't meant to include paging at this stage.
  • Wasn't meant to include encryption at this stage.

Split to one file per entity type. Every file is a csv document compliant with RFC-4180. Splitting to a file per record makes it also easier theoretically to enable paging at a later stage when a file grows to a certain size.

POST /backup/schedule: Schedules a new backup job.

GET /backup/jobs: Retrieves a list of backup jobs with their relevant status past or present.

Example response:

[
{  "jobId": "guid", "status": "pending", "createdAt": "ISO-8160", "lastUpdatedAt": "ISO-8160" },
{  "jobId": "guid", "status": "failed", "createdAt": "ISO-8160", "lastUpdatedAt": "ISO-8160" },
{  "jobId": "guid", "status": "running", "createdAt": "ISO-8160", "lastUpdatedAt": "ISO-8160" },
{  "jobId": "guid", "status": "completed", "createdAt": "ISO-8160", "lastUpdatedAt": "ISO-8160" },
]

The data stored is stored as a UTF-8 encoded byte array.

public class ArchiveJob : PerTenant
{
    public Guid Id { get; set; }

    public AccountMetaInformation Application { get; set; }

    public DateTime CreatedAt { get; set; }

    public DateTime? UpdatedAt { get; set; }

    public JobStatus Status { get; set; } = JobStatus.Pending;

    public List<Archive> Archives { get; set; } = new();
}
public class Archive : PerTenant
{
    public Guid Id { get; set; }

    /// <summary>
    /// The identifier of the backup job that created this archive.
    /// </summary>
    public Guid JobId { get; set; }

    public DateTime CreatedAt { get; set; }

    public Type? Entity { get; set; }

    [MaxLength(100 * 1024 * 1024, ErrorMessage = "Data cannot be larger than 100MB.")]
    public byte[] Data { get; set; }

    public AccountMetaInformation Application { get; set; } = null!;
}

The upload process would likely involve a user to create a new organization, and import an old app by uploading the exported files. The restoration would only be able to start if the schema's match and when all documents were successfully uploaded.

ApplicationEvents

An additional migration for ApplicationEvents had to be included, as it was not inheriting from PerTenant according to the conventions we had laid out for DbTenantContext. This was then essentially messing with the generics I had implemented in BackupWorkerService. This would have otherwise significantly impacted maintainability as well.

To prevent any unnecessary migrations from being executed which could result into data loss, the database column was mapped manually to its original value. Just the snapshot was essentially updated to reflect the mapping from the column name to the property (CLR).

Screenshots

n/a

Checklist

I did the following to ensure that my changes were tested thoroughly:

  • [ ] Unit tests
  • [ ] Integration tests

I did the following to ensure that my changes did not introduce new security vulnerabilities:

  • [ ] Secured endpoints with secret key.
  • [ ] Sanitization for macro's was ignored, given these backups are not meant to be opened in a program like Microsoft Excel.
  • [ ] In this implementation, a DoS attack would be possible with very large customers.

jonashendrickx avatar Mar 06 '24 10:03 jonashendrickx

Codecov Report

Attention: Patch coverage is 47.23502% with 1374 lines in your changes missing coverage. Please review.

Project coverage is 33.91%. Comparing base (a194a43) to head (0574e1e). Report is 135 commits behind head on main.

Files Patch % Lines
...7_MakeApplicationEventInheritPerTenant.Designer.cs 0.00% 503 Missing :warning:
.../Sqlite/20240311143910_AddBackupTables.Designer.cs 0.00% 502 Missing :warning:
...vice/Migrations/Mssql/MsSqlContextModelSnapshot.cs 0.00% 97 Missing :warning:
...ce/Migrations/Sqlite/SqliteContextModelSnapshot.cs 0.00% 97 Missing :warning:
...igrations/Sqlite/20240311143910_AddBackupTables.cs 0.00% 65 Missing :warning:
src/Service/Backup/BackupWorkerService.cs 0.00% 50 Missing :warning:
src/Service/Backup/BackupBackgroundService.cs 0.00% 29 Missing :warning:
src/Service/Backup/BackupService.cs 78.57% 6 Missing and 3 partials :warning:
src/Service/Models/Archive.cs 0.00% 6 Missing :warning:
src/Common/Backup/Mapping/EntityFrameworkMap.cs 64.28% 3 Missing and 2 partials :warning:
... and 6 more
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #465      +/-   ##
==========================================
+ Coverage   32.63%   33.91%   +1.28%     
==========================================
  Files         504      525      +21     
  Lines       26394    28985    +2591     
  Branches      819      833      +14     
==========================================
+ Hits         8613     9831    +1218     
- Misses      17670    19038    +1368     
- Partials      111      116       +5     

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

codecov[bot] avatar Mar 07 '24 07:03 codecov[bot]

Re-iterating earlier comment: Let's review this PR but do not merge it until we've shipped some of the things currently in main to prod.

abergs avatar Mar 12 '24 13:03 abergs