geostore icon indicating copy to clipboard operation
geostore copied to clipboard

Allow more characters in dataset titles

Open billgeo opened this issue 2 years ago • 0 comments

User Story

So that I can create meaningful end dataset title and s3 urls, as a data maintainer, I want to be able to add other characters (e.g. ., (U0027) and macronated characters in the dataset title and therefore the s3 prefix

Should consider downstream issues with some characters in Windows/Linux/JSON etc (this has been documented in #1975 ). Update acceptance criteria when this is done.

Acceptance Criteria

  • [ ] Given a dataset title with a fullstop, . or apostrophe in it, when new dataset is created with this dataset title, then the dataset title is accepted and the dataset is created and files are created with the fullstops in the s3 prefix. See #1975 for more information.
  • [x] Given a dataset title with a macronated character in it (e.g. ā,Ō etc), when new dataset is created with this dataset title, then the dataset title is accepted and the dataset is created and files are created with macronated characer in the s3 prefix.
  • [ ] Given a dataset with any valid dataset title character, when the user copies the data to their local filesytesm (Mac, Linux, Windows), then they can access the files with a file browser.

Additional context

Discussion from data managers on how they want to name, organise and access their data (particularly in the aerial imagery area) highlights that we should consider adding slashes /and other useful characters . to the allowed characters for a dataset title and therefore it's s3 url/path.

S3 limitations of characters in S3 keys https://docs.aws.amazon.com/AmazonS3/latest/userguide/object-keys.html

See confluence page here and slack discussion here

Tasks

  • [ ] Add the characters described here https://github.com/linz/geostore/issues/2105#issuecomment-1270691825
  • [ ] ...

Definition of Ready

  • [ ] This story is ready to work on
    • [ ] Independent (story is independent of all other tasks)
    • [ ] Negotiable (team can decide how to design and implement)
    • [ ] Valuable (from a user perspective)
    • [ ] Estimate value applied (agreed by team)
    • [ ] Small (so as to fit within an iteration)
    • [ ] Testable (in principle, even if there isn't a test for it yet)
    • [ ] Environments are ready to meet definition of done
    • [ ] Resources required to implement will be ready
    • [ ] Everyone understands and agrees with the tasks to complete the story
    • [ ] Release value (e.g. Iteration 3) applied
    • [ ] Sprint value (e.g. Aug 1 - Aug 15) applied

Definition of Done

  • [ ] This story is done:
    • [ ] Acceptance criteria completed
    • [ ] Automated tests are passing
    • [ ] Code is peer reviewed and pushed to master
    • [ ] Deployed successfully to test environment
    • [ ] Checked against CODING guidelines
    • [ ] Relevant new tasks are added to backlog and communicated to the team
    • [ ] Important decisions recorded in the issue ticket
    • [ ] Readme/Changelog/Diagrams are updated
    • [ ] Product Owner has approved acceptance criteria as complete
    • [ ] Meets non-functional requirements:
      • [ ] Scalability (data): Can scale to 300TB of data and 100,000,000 files and ability to increase 10% every year
      • [ ] Scability (users): Can scale to 100 concurrent users
      • [ ] Cost: Data can be stored at < 0.5 NZD per GB per year
      • [ ] Performance: A large dataset (500 GB and 50,000 files - e.g. Akl aerial imagery) can be validated, imported and stored within 24 hours
      • [ ] Accessibility: Can be used from LINZ networks and the public internet
      • [ ] Availability: System available 24 hours a day and 7 days a week, this does not include maintenance windows < 4 hours and does not include operational support
      • [ ] Recoverability: RPO of fully imported datasets < 4 hours, RTO of a single 3 TB dataset < 12 hours

billgeo avatar Aug 30 '22 21:08 billgeo