webknossos Allow renaming datasets & dataset with duplicate names

Further Notes:

Quite some of the line changes are the result of moving the ObjectId class to the utils package so that all wk backend servers have access to this class.

URL of deployed dev instance (used for testing):

https://___.webknossos.xyz

Steps to test:

Give two datasets the same name and check whether annotations and so on works
Test whether the task system still works with duplicate dataset names
check dataset upload
- dataset upload
- add remote
- compose
...

TODOs:

[x] Add evolution and reversion
- [x] testing needed
[x] Test uploading:
- [x] Report upload fails
[x] Adjust worker to newest job arguments as the dataset name can no longer be used to uniquely identify a dataset
[x] rename organization_name in worker to organization_id. see #8038
[x] Dataset Name settings field has an unwanted spinner (see upload view)
[x] Check the job list
[x] Properly implement legacy searching for datasets when old URI param is used
[x] Adjust legacy API routes to return dataset in old format
- It is just an additional field. Thus, I would say it should be fine.
[x] datasets appear to be duplicated in the db
- Maybe these are created by jobs with an output dataset
[x] Fix dataset insert
[x] Skeleton & VolumeTracings address a dataset via its name
- Not really used only during task / annotation creation
- Use heuristic upon upload and temporary patch the Tracing case classes to carry the datasetId during the creation process once the dataset is identified once.
- Task creation works
- [x] Needs testing
  - fix annotation upload
- [x] needs to support old nmls
[x] Put datasetId into newly created nmls
[x] In the backend LinkedLayerIdentifier still uses the datasetName as an identifier
- used in wklibs, maybe just interpret the name as a path and work with this. in case it cannot be found the user needs to update wklibs. Add comment for this!
~~[ ] the dataset C555_tps_demo has quite some bucket loading errors. Unsure why some buckets do not work~~ The dataset seems to be broken. Could reproduce this on other branches
[x] Notion-style URLs are missing (i.e. -, but only the id part is actually used)
[x] Maybe remove DatasetURIParser

Issues:

fixes #6613 and #8038

(Please delete unneeded items, merge only when none are left open)

[ ] Updated changelog
[ ] Updated migration guide if applicable
[ ] Updated documentation if applicable
[ ] Adapted wk-libs python client if relevant API parts change
[ ] Removed dev-only changes like prints and application.conf edits
[ ] Considered common edge cases
[ ] Needs datastore update after deployment

Sep 12 '24 16:09 MichaelBuessemeyer

@fm3 The backend should be ready for a first glance :eyes:

Oct 21 '24 08:10 MichaelBuessemeyer

Note: We agree to renaming dataset.path to dataset.directory_name as this is more clear. See here for more details: https://scm.slack.com/archives/C5AKLAV0B/p1729689141538769

Oct 24 '24 08:10 MichaelBuessemeyer

@coderabbitai full review

Oct 24 '24 12:10 hotzenklotz

:white_check_mark: Actions performed

Full review triggered.

Oct 24 '24 12:10 coderabbitai[bot]

[!NOTE]

Reviews paused

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.

@coderabbitai review to trigger a single review.

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Generate unit testing code for this file.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai generate unit testing code for this file.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read src/utils.ts and generate unit testing code.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
- @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

Oct 24 '24 12:10 coderabbitai[bot]

@coderabbitai pause

as pr it tool large anyway

Oct 29 '24 08:10 MichaelBuessemeyer

@MichaelBuessemeyer: Understood, I will pause the automatic reviews for this PR.

:white_check_mark: Actions performed

Reviews paused.

Oct 29 '24 08:10 coderabbitai[bot]

I think this should be ready to go now. In case you notice some errors please tell me :pray:

Oct 29 '24 15:10 MichaelBuessemeyer

Getting close to +3k :tada: :laughing:

Nov 07 '24 10:11 MichaelBuessemeyer

@daniel-wer Could you please take over the frontend review of this PR? Before you start, I'd suggest to talk a little about the high level changes so that the review should be easier for you.

Nov 07 '24 10:11 MichaelBuessemeyer

Sorry I could not find all the right threads here to reply to, these are my latest notes on the open questions:

[x] TaskCreationParameters: let’s do the api adapter
- [x] create from files (from json) is missing
- [x] Edit: Shouldn't be necessary as the frontend simply sends to much information: TODO: double check this.
- looks fine to me (@MichaelBuessemeyer)
[x] Job resultLink → let’s do 1 now and write a followup issue and discuss there if 2 or 3 is better
[x] messages → let’s go with the existing dataset.notFound
[x] NMLParser → catch exceptions in xml stuff, then use resulting box as input for the async stuff
[x] New wish: In settings view, add “Dataset Configuration as stored on datastore data-humerus at organization_name/datasetDirectoryName” to headline in advanced view
[x] In OpenGraphService, add comment what datasetNameAndId.split("-").lastOption is about (id+name uris)

Nov 07 '24 12:11 fm3

When clicking on the dataset name link at the top of the dataset settings page (e.g. https://allowdatasetrenaming.webknossos.xyz/datasets/l4_sample-6735ed6c010000d2004849e6/edit), the link https://allowdatasetrenaming.webknossos.xyz/datasets/6735ed6c010000d2004849e6 which makes webknossos crash

Thanks for finding this. It should work again now

Nov 18 '24 11:11 MichaelBuessemeyer

In the datasource-properties there is a field ["id"] which is not changed if the dataset name is changed. Is this intentional?

This is ok from my side. The backend does not use this field. Instead, the location on disk/datastore filesystem counts. It’s there only for backwards compatibility. Maybe we should set it to emptystring in the future to avoid confusion. But that does not need to be part of this PR. @markbader do you know if the libs use the id field of the datasource-properties json for anything?

it would be great if @fm3 could test as well

Will do! I’ll have a closer look at the worker jobs then (also compare vx pr https://github.com/scalableminds/voxelytics/pull/3743 )

Nov 20 '24 10:11 fm3

do you know if the libs use the id field of the datasource-properties json for anything?

@fm3 The libs currently do not consider the path name but use the id field of the datasource-properties.json file for accessing the name of the dataset. So the getter and setter of Dataset.name use it. I don't see any other occurences where the id field is used.

Nov 20 '24 12:11 markbader

In the datasource-properties there is a field like
"id": {
 "name": "asdasdasd",
 "team": "sample_organization"
}
which is not changed if the dataset name is changed. Is this intentional?

This is 100% intentional. The reason is that the id in the datasource-properties.json represents a dataset's DataSourceId: A tuple of team: orgaId & name: !directoryName!. This means that name is equivalent to directoryName and not the dataset's name. I know this is unintuitive, but that's how the format / naming has to be due to legacy reasons. Therfore, updating the id.name field when the dataset's name is changed, would be semantically wrong as this would mean that the user changes the directoryName and this should not be possible in all cases.

Moving a single dataset from the dataset list into a folder using drag&drop fails with Could not move dataset. Please try again.. However, the batch moving of multiple datasets works.

Oh thanks for finding this. I'll take a look :)

Nov 20 '24 12:11 MichaelBuessemeyer

The libs currently do not consider the path name but use the id field of the datasource-properties.json file for accessing the name of the dataset. So the getter and setter of Dataset.name use it. I don't see any other occurences where the id field is used.

Hmm ok I guess we’ll have to change that at some point when adapting to this renamable datasets stuff. But shouldn’t break anything now :crossed_fingers:

Nov 20 '24 13:11 fm3

Moving a single dataset from the dataset list into a folder using drag&drop fails with Could not move dataset. Please try again.. However, the batch moving of multiple datasets works.

Should be fixed now. Was a little buggy :bug:. While fixing this, I tired to also improve the typing

Nov 20 '24 15:11 MichaelBuessemeyer

I took the liberty to push two commits, one renames some local variables, and one fixes starting an infer_with_model job in the AiModelController. Hope that’s ok for you!

Sure, thanks a lot :pray:

Nov 22 '24 09:11 MichaelBuessemeyer

I now removed the dataSet field in jsonified task object and added the legacy adaption. I checked routes that needed adaptation and came up with that the following do not need adaptation or do they?

- /tasks/:id/assign                                     controllers.TaskController.assignOne
- /taskTypes/:id/tasks                                  controllers.TaskController.listTasksForType
- /tasks/list                                           controllers.TaskController.listTasks
- /user/tasks/peek                                      controllers.TaskController.peekNext
- /tasks/:id                                            controllers.TaskController.update
- /annotations/:id/addAnnotationLayer                   controllers.AnnotationController.addAnnotationLayerWithoutType
- /annotations/:typ/:id/addAnnotationLayer              controllers.AnnotationController.addAnnotationLayer
- /datasets/:datasetId/createExplorational              controllers.AnnotationController.createExplorational
- /annotations/:id/downsample                           controllers.AnnotationController.downsampleWithoutType
- /annotations/:typ/:id/downsample                      controllers.AnnotationController.downsample
- /annotations/:typ/:id/duplicate                       controllers.AnnotationController.duplicate
- /annotations/:typ/:id/editLockedState                 controllers.AnnotationController.editLockedState
- /annotations/:typ/:id/finish                          controllers.AnnotationController.finish
- /datasets/:datasetId/sandbox/:typ                     controllers.AnnotationController.getSandbox
- /annotations/:typ/:id/info                            controllers.AnnotationController.info
- /annotations/:typ/:id/makeHybrid                      controllers.AnnotationController.makeHybrid
- /annotations/:typ/:id/merge/:mergedTyp/:mergedId      controllers.AnnotationController.merge
- /annotations/:typ/:id/reopen                          controllers.AnnotationController.reopen
- /annotations/:typ/:id/reset                           controllers.AnnotationController.reset
- /annotations/:typ/:id/transfer                        controllers.AnnotationController.transfer

I'll do testing of the new legacy routes later

Nov 26 '24 14:11 MichaelBuessemeyer

I checked routes that needed adaptation and came up with that the following do not need adaptation or do they?

Looks right to me!

Nov 26 '24 14:11 fm3

Hi @fm3,

I added the required legacy routes to remove dataSet from the json serialized task objects in the newest API version. Please find the following links & curl script to test these routes with the dev instance. All ids included & dataset names should exist on the dev instance and therefore just clicking the links should work. As these are legacy routes, the results should include the legacy field dataSet in each task object. Some routes return an annotation which itself has a task.

For the curl commands, you first need to fill in the id cookie before you can run the scripts.

Before testing here is one more important thing to double check: The legacy routes now include both: dataSet and datasetName. Can wklibs handle / ignore new unexpected json fields?

Checklist for testing new Legacy Routes

https://allowdatasetrenaming.webknossos.xyz/api/v8/tasks/67464831010000ea00ca5839
https://allowdatasetrenaming.webknossos.xyz/api/v7/tasks/67464831010000ea00ca5839
https://allowdatasetrenaming.webknossos.xyz/api/v6/tasks/67464831010000ea00ca5839
https://allowdatasetrenaming.webknossos.xyz/api/v5/tasks/67464831010000ea00ca5839
https://allowdatasetrenaming.webknossos.xyz/api/v4/tasks/67464831010000ea00ca5839 -> should show error
https://allowdatasetrenaming.webknossos.xyz/api/v8/projects/6735ed2e0100008c00484975/tasks
https://allowdatasetrenaming.webknossos.xyz/api/v7/projects/6735ed2e0100008c00484975/tasks
...
https://allowdatasetrenaming.webknossos.xyz/api/v8/annotations/6746db83010000b100416ed2/info?timestamp=0
https://allowdatasetrenaming.webknossos.xyz/api/v7/annotations/6746db83010000b100416ed2/info?timestamp=0
...
https://allowdatasetrenaming.webknossos.xyz/api/v8/tasks/6746482e010000d500ca5835/annotations
https://allowdatasetrenaming.webknossos.xyz/api/v7/tasks/6746482e010000d500ca5835/annotations
...

Testing create & update task -> :shurg: I did this manually in firefox with the edit & resent feature on the dev instance.

In case you want to test more complicated routes:

curl command for checking task creation (setting id cookie is required)

curl 'https://allowdatasetrenaming.webknossos.xyz/api/tasks' \
  -H 'accept: application/json' \
  -H 'accept-language: en-US,en;q=0.9,de;q=0.8' \
  -H 'cache-control: no-cache' \
  -H 'content-type: application/json' \
  -H 'cookie: id=<fill-me-in>' \
  -H 'origin: https://allowdatasetrenaming.webknossos.xyz' \
  -H 'pragma: no-cache' \
  -H 'priority: u=1, i' \
  -H 'referer: https://allowdatasetrenaming.webknossos.xyz/tasks/create' \
  -H 'sec-ch-ua: "Google Chrome";v="131", "Chromium";v="131", "Not_A Brand";v="24"' \
  -H 'sec-ch-ua-mobile: ?0' \
  -H 'sec-ch-ua-platform: "Linux"' \
  -H 'sec-fetch-dest: empty' \
  -H 'sec-fetch-mode: cors' \
  -H 'sec-fetch-site: same-origin' \
  -H 'user-agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/131.0.0.0 Safari/537.36' \
  --data-raw '[{"taskTypeId":"63721e2cef0100470266c485","neededExperience":{"domain":"sampleExp","value":1},"pendingInstances":1,"projectName":"sampleProject","boundingBox":null,"dataSet":"kiwi","editPosition":[0,0,0],"editRotation":[0,0,0],"baseAnnotation":null}]'

Test task update: (setting id cookie is required)

curl 'https://allowdatasetrenaming.webknossos.xyz/api/tasks/6746ecc6010000d80015ea8c' \
  -X 'PUT' \
  -H 'accept: application/json' \
  -H 'accept-language: en-US,en;q=0.9,de;q=0.8' \
  -H 'cache-control: no-cache' \
  -H 'content-type: application/json' \
  -H 'cookie: id=<fill-me-in>' \
  -H 'origin: https://allowdatasetrenaming.webknossos.xyz' \
  -H 'pragma: no-cache' \
  -H 'priority: u=1, i' \
  -H 'referer: https://allowdatasetrenaming.webknossos.xyz/tasks/6746ecc6010000d80015ea8c/edit' \
  -H 'sec-ch-ua: "Google Chrome";v="131", "Chromium";v="131", "Not_A Brand";v="24"' \
  -H 'sec-ch-ua-mobile: ?0' \
  -H 'sec-ch-ua-platform: "Linux"' \
  -H 'sec-fetch-dest: empty' \
  -H 'sec-fetch-mode: cors' \
  -H 'sec-fetch-site: same-origin' \
  -H 'user-agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/131.0.0.0 Safari/537.36' \
  --data-raw '{"taskTypeId":"63721e2cef0100470266c485","neededExperience":{"domain":"sampleExp","value":1},"pendingInstances":2,"projectName":"sampleProject","boundingBox":null,"datasetId":"kiwi","editPosition":[0,0,0],"editRotation":[0,0,0]}'

Nov 27 '24 10:11 MichaelBuessemeyer

merge conflicts & your two comments are done now. So it's ready for :ship:

Nov 27 '24 11:11 MichaelBuessemeyer

:tada: :crossed_fingers:

Nov 27 '24 11:11 fm3

Allow renaming datasets & dataset with duplicate names

Further Notes:

URL of deployed dev instance (used for testing):

Steps to test:

TODOs:

Issues:

Reviews paused

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (.coderabbit.yaml)

Documentation and Community

Checklist for testing new Legacy Routes

In case you want to test more complicated routes:

curl command for checking task creation (setting id cookie is required)

Test task update: (setting id cookie is required)

CodeRabbit Configuration File (`.coderabbit.yaml`)