Allow renaming datasets & dataset with duplicate names
Further Notes:
- Quite some of the line changes are the result of moving the ObjectId class to the utils package so that all wk backend servers have access to this class.
URL of deployed dev instance (used for testing):
- https://___.webknossos.xyz
Steps to test:
- Give two datasets the same name and check whether annotations and so on works
- Test whether the task system still works with duplicate dataset names
- check dataset upload
- dataset upload
- add remote
- compose
- ...
TODOs:
- [x] Add evolution and reversion
- [x] testing needed
- [x] Test uploading:
- [x] Report upload fails
- [x] Adjust worker to newest job arguments as the dataset name can no longer be used to uniquely identify a dataset
- [x] rename
organization_namein worker toorganization_id. see #8038 - [x] Dataset Name settings field has an unwanted spinner (see upload view)
- [x] Check the job list
- [x] Properly implement legacy searching for datasets when old URI param is used
- [x] Adjust legacy API routes to return dataset in old format
- It is just an additional field. Thus, I would say it should be fine.
- [x] datasets appear to be duplicated in the db
- Maybe these are created by jobs with an output dataset
- [x] Fix dataset insert
- [x] Skeleton & VolumeTracings address a dataset via its name
- Not really used only during task / annotation creation
- Use heuristic upon upload and temporary patch the Tracing case classes to carry the datasetId during the creation process once the dataset is identified once.
- Task creation works
- [x] Needs testing
- fix annotation upload
- [x] needs to support old nmls
- [x] Put datasetId into newly created nmls
- [x] In the backend
LinkedLayerIdentifierstill uses the datasetName as an identifier- used in wklibs, maybe just interpret the name as a path and work with this. in case it cannot be found the user needs to update wklibs. Add comment for this!
- ~~[ ] the dataset
C555_tps_demohas quite some bucket loading errors. Unsure why some buckets do not work~~ The dataset seems to be broken. Could reproduce this on other branches - [x] Notion-style URLs are missing (i.e.
- , but only the id part is actually used) - [x] Maybe remove
DatasetURIParser
Issues:
- fixes #6613 and #8038
(Please delete unneeded items, merge only when none are left open)
- [ ] Updated changelog
- [ ] Updated migration guide if applicable
- [ ] Updated documentation if applicable
- [ ] Adapted wk-libs python client if relevant API parts change
- [ ] Removed dev-only changes like prints and application.conf edits
- [ ] Considered common edge cases
- [ ] Needs datastore update after deployment
@fm3 The backend should be ready for a first glance :eyes:
Note: We agree to renaming dataset.path to dataset.directory_name as this is more clear. See here for more details: https://scm.slack.com/archives/C5AKLAV0B/p1729689141538769
@coderabbitai full review
:white_check_mark: Actions performed
Full review triggered.
[!NOTE]
Reviews paused
Use the following commands to manage reviews:
@coderabbitai resumeto resume automatic reviews.@coderabbitai reviewto trigger a single review.
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?
🪧 Tips
Chat
There are 3 ways to chat with CodeRabbit:
- Review comments: Directly reply to a review comment made by CodeRabbit. Example:
I pushed a fix in commit <commit_id>, please review it.Generate unit testing code for this file.Open a follow-up GitHub issue for this discussion.
- Files and specific lines of code (under the "Files changed" tab): Tag
@coderabbitaiin a new review comment at the desired location with your query. Examples:@coderabbitai generate unit testing code for this file.@coderabbitai modularize this function.
- PR comments: Tag
@coderabbitaiin a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:@coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.@coderabbitai read src/utils.ts and generate unit testing code.@coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.@coderabbitai help me debug CodeRabbit configuration file.
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.
CodeRabbit Commands (Invoked using PR comments)
@coderabbitai pauseto pause the reviews on a PR.@coderabbitai resumeto resume the paused reviews.@coderabbitai reviewto trigger an incremental review. This is useful when automatic reviews are disabled for the repository.@coderabbitai full reviewto do a full review from scratch and review all the files again.@coderabbitai summaryto regenerate the summary of the PR.@coderabbitai resolveresolve all the CodeRabbit review comments.@coderabbitai configurationto show the current CodeRabbit configuration for the repository.@coderabbitai helpto get help.
Other keywords and placeholders
- Add
@coderabbitai ignoreanywhere in the PR description to prevent this PR from being reviewed. - Add
@coderabbitai summaryto generate the high-level summary at a specific location in the PR description. - Add
@coderabbitaianywhere in the PR title to generate the title automatically.
CodeRabbit Configuration File (.coderabbit.yaml)
- You can programmatically configure CodeRabbit by adding a
.coderabbit.yamlfile to the root of your repository. - Please see the configuration documentation for more information.
- If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation:
# yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json
Documentation and Community
- Visit our Documentation for detailed information on how to use CodeRabbit.
- Join our Discord Community to get help, request features, and share feedback.
- Follow us on X/Twitter for updates and announcements.
@coderabbitai pause
as pr it tool large anyway
@MichaelBuessemeyer: Understood, I will pause the automatic reviews for this PR.
:white_check_mark: Actions performed
Reviews paused.
I think this should be ready to go now. In case you notice some errors please tell me :pray:
Getting close to +3k :tada: :laughing:
@daniel-wer Could you please take over the frontend review of this PR? Before you start, I'd suggest to talk a little about the high level changes so that the review should be easier for you.
Sorry I could not find all the right threads here to reply to, these are my latest notes on the open questions:
- [x] TaskCreationParameters: let’s do the api adapter
- [x] create from files (from json) is missing
- [x] Edit: Shouldn't be necessary as the frontend simply sends to much information: TODO: double check this.
- looks fine to me (@MichaelBuessemeyer)
- [x] Job resultLink → let’s do 1 now and write a followup issue and discuss there if 2 or 3 is better
- [x] messages → let’s go with the existing dataset.notFound
- [x] NMLParser → catch exceptions in xml stuff, then use resulting box as input for the async stuff
- [x] New wish: In settings view, add “Dataset Configuration as stored on datastore data-humerus at organization_name/datasetDirectoryName” to headline in advanced view
- [x] In OpenGraphService, add comment what
datasetNameAndId.split("-").lastOptionis about (id+name uris)
When clicking on the dataset name link at the top of the dataset settings page (e.g. https://allowdatasetrenaming.webknossos.xyz/datasets/l4_sample-6735ed6c010000d2004849e6/edit), the link https://allowdatasetrenaming.webknossos.xyz/datasets/6735ed6c010000d2004849e6 which makes webknossos crash
Thanks for finding this. It should work again now
In the datasource-properties there is a field [
"id"] which is not changed if the dataset name is changed. Is this intentional?
This is ok from my side. The backend does not use this field. Instead, the location on disk/datastore filesystem counts. It’s there only for backwards compatibility. Maybe we should set it to emptystring in the future to avoid confusion. But that does not need to be part of this PR. @markbader do you know if the libs use the id field of the datasource-properties json for anything?
it would be great if @fm3 could test as well
Will do! I’ll have a closer look at the worker jobs then (also compare vx pr https://github.com/scalableminds/voxelytics/pull/3743 )
do you know if the libs use the id field of the datasource-properties json for anything?
@fm3 The libs currently do not consider the path name but use the id field of the datasource-properties.json file for accessing the name of the dataset. So the getter and setter of Dataset.name use it. I don't see any other occurences where the id field is used.
In the datasource-properties there is a field like
"id": { "name": "asdasdasd", "team": "sample_organization" }which is not changed if the dataset name is changed. Is this intentional?
This is 100% intentional. The reason is that the id in the datasource-properties.json represents a dataset's DataSourceId: A tuple of team: orgaId & name: !directoryName!. This means that name is equivalent to directoryName and not the dataset's name. I know this is unintuitive, but that's how the format / naming has to be due to legacy reasons. Therfore, updating the id.name field when the dataset's name is changed, would be semantically wrong as this would mean that the user changes the directoryName and this should not be possible in all cases.
Moving a single dataset from the dataset list into a folder using drag&drop fails with Could not move dataset. Please try again.. However, the batch moving of multiple datasets works.
Oh thanks for finding this. I'll take a look :)
The libs currently do not consider the path name but use the id field of the datasource-properties.json file for accessing the name of the dataset. So the getter and setter of Dataset.name use it. I don't see any other occurences where the id field is used.
Hmm ok I guess we’ll have to change that at some point when adapting to this renamable datasets stuff. But shouldn’t break anything now :crossed_fingers:
Moving a single dataset from the dataset list into a folder using drag&drop fails with Could not move dataset. Please try again.. However, the batch moving of multiple datasets works.
Should be fixed now. Was a little buggy :bug:. While fixing this, I tired to also improve the typing
I took the liberty to push two commits, one renames some local variables, and one fixes starting an infer_with_model job in the AiModelController. Hope that’s ok for you!
Sure, thanks a lot :pray:
I now removed the dataSet field in jsonified task object and added the legacy adaption.
I checked routes that needed adaptation and came up with that the following do not need adaptation or do they?
- /tasks/:id/assign controllers.TaskController.assignOne
- /taskTypes/:id/tasks controllers.TaskController.listTasksForType
- /tasks/list controllers.TaskController.listTasks
- /user/tasks/peek controllers.TaskController.peekNext
- /tasks/:id controllers.TaskController.update
- /annotations/:id/addAnnotationLayer controllers.AnnotationController.addAnnotationLayerWithoutType
- /annotations/:typ/:id/addAnnotationLayer controllers.AnnotationController.addAnnotationLayer
- /datasets/:datasetId/createExplorational controllers.AnnotationController.createExplorational
- /annotations/:id/downsample controllers.AnnotationController.downsampleWithoutType
- /annotations/:typ/:id/downsample controllers.AnnotationController.downsample
- /annotations/:typ/:id/duplicate controllers.AnnotationController.duplicate
- /annotations/:typ/:id/editLockedState controllers.AnnotationController.editLockedState
- /annotations/:typ/:id/finish controllers.AnnotationController.finish
- /datasets/:datasetId/sandbox/:typ controllers.AnnotationController.getSandbox
- /annotations/:typ/:id/info controllers.AnnotationController.info
- /annotations/:typ/:id/makeHybrid controllers.AnnotationController.makeHybrid
- /annotations/:typ/:id/merge/:mergedTyp/:mergedId controllers.AnnotationController.merge
- /annotations/:typ/:id/reopen controllers.AnnotationController.reopen
- /annotations/:typ/:id/reset controllers.AnnotationController.reset
- /annotations/:typ/:id/transfer controllers.AnnotationController.transfer
I'll do testing of the new legacy routes later
I checked routes that needed adaptation and came up with that the following do not need adaptation or do they?
Looks right to me!
Hi @fm3,
I added the required legacy routes to remove dataSet from the json serialized task objects in the newest API version. Please find the following links & curl script to test these routes with the dev instance. All ids included & dataset names should exist on the dev instance and therefore just clicking the links should work. As these are legacy routes, the results should include the legacy field dataSet in each task object. Some routes return an annotation which itself has a task.
For the curl commands, you first need to fill in the id cookie before you can run the scripts.
Before testing here is one more important thing to double check: The legacy routes now include both: dataSet and datasetName. Can wklibs handle / ignore new unexpected json fields?
Checklist for testing new Legacy Routes
- https://allowdatasetrenaming.webknossos.xyz/api/v8/tasks/67464831010000ea00ca5839
- https://allowdatasetrenaming.webknossos.xyz/api/v7/tasks/67464831010000ea00ca5839
- https://allowdatasetrenaming.webknossos.xyz/api/v6/tasks/67464831010000ea00ca5839
- https://allowdatasetrenaming.webknossos.xyz/api/v5/tasks/67464831010000ea00ca5839
- https://allowdatasetrenaming.webknossos.xyz/api/v4/tasks/67464831010000ea00ca5839 -> should show error
- https://allowdatasetrenaming.webknossos.xyz/api/v8/projects/6735ed2e0100008c00484975/tasks
- https://allowdatasetrenaming.webknossos.xyz/api/v7/projects/6735ed2e0100008c00484975/tasks
- ...
- https://allowdatasetrenaming.webknossos.xyz/api/v8/annotations/6746db83010000b100416ed2/info?timestamp=0
- https://allowdatasetrenaming.webknossos.xyz/api/v7/annotations/6746db83010000b100416ed2/info?timestamp=0
- ...
- https://allowdatasetrenaming.webknossos.xyz/api/v8/tasks/6746482e010000d500ca5835/annotations
- https://allowdatasetrenaming.webknossos.xyz/api/v7/tasks/6746482e010000d500ca5835/annotations
- ...
Testing create & update task -> :shurg: I did this manually in firefox with the edit & resent feature on the dev instance.
In case you want to test more complicated routes:
curl command for checking task creation (setting id cookie is required)
curl 'https://allowdatasetrenaming.webknossos.xyz/api/tasks' \
-H 'accept: application/json' \
-H 'accept-language: en-US,en;q=0.9,de;q=0.8' \
-H 'cache-control: no-cache' \
-H 'content-type: application/json' \
-H 'cookie: id=<fill-me-in>' \
-H 'origin: https://allowdatasetrenaming.webknossos.xyz' \
-H 'pragma: no-cache' \
-H 'priority: u=1, i' \
-H 'referer: https://allowdatasetrenaming.webknossos.xyz/tasks/create' \
-H 'sec-ch-ua: "Google Chrome";v="131", "Chromium";v="131", "Not_A Brand";v="24"' \
-H 'sec-ch-ua-mobile: ?0' \
-H 'sec-ch-ua-platform: "Linux"' \
-H 'sec-fetch-dest: empty' \
-H 'sec-fetch-mode: cors' \
-H 'sec-fetch-site: same-origin' \
-H 'user-agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/131.0.0.0 Safari/537.36' \
--data-raw '[{"taskTypeId":"63721e2cef0100470266c485","neededExperience":{"domain":"sampleExp","value":1},"pendingInstances":1,"projectName":"sampleProject","boundingBox":null,"dataSet":"kiwi","editPosition":[0,0,0],"editRotation":[0,0,0],"baseAnnotation":null}]'
Test task update: (setting id cookie is required)
curl 'https://allowdatasetrenaming.webknossos.xyz/api/tasks/6746ecc6010000d80015ea8c' \
-X 'PUT' \
-H 'accept: application/json' \
-H 'accept-language: en-US,en;q=0.9,de;q=0.8' \
-H 'cache-control: no-cache' \
-H 'content-type: application/json' \
-H 'cookie: id=<fill-me-in>' \
-H 'origin: https://allowdatasetrenaming.webknossos.xyz' \
-H 'pragma: no-cache' \
-H 'priority: u=1, i' \
-H 'referer: https://allowdatasetrenaming.webknossos.xyz/tasks/6746ecc6010000d80015ea8c/edit' \
-H 'sec-ch-ua: "Google Chrome";v="131", "Chromium";v="131", "Not_A Brand";v="24"' \
-H 'sec-ch-ua-mobile: ?0' \
-H 'sec-ch-ua-platform: "Linux"' \
-H 'sec-fetch-dest: empty' \
-H 'sec-fetch-mode: cors' \
-H 'sec-fetch-site: same-origin' \
-H 'user-agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/131.0.0.0 Safari/537.36' \
--data-raw '{"taskTypeId":"63721e2cef0100470266c485","neededExperience":{"domain":"sampleExp","value":1},"pendingInstances":2,"projectName":"sampleProject","boundingBox":null,"datasetId":"kiwi","editPosition":[0,0,0],"editRotation":[0,0,0]}'
merge conflicts & your two comments are done now. So it's ready for :ship:
:tada: :crossed_fingers: