cvat icon indicating copy to clipboard operation
cvat copied to clipboard

Matching CVAT task/job with dumped data

Open nstolyarov opened this issue 3 years ago • 13 comments

My actions before raising this issue

  • [x] Read/searched the docs
  • [x] Searched past issues

Is there any option to match dumped data (ImageNet or Semantic Segmentation dump e.g.) with the same data in CVAT's task -> job -> id?

Where can I get written information about images names in concrete task/job? Maybe DB or any table?

Context

After some automatical operations with annotations (getting metrics for 1.000+ images e.g.) I want to know where should I fix the wrong annotation in CVAT.

Your Environment

  • Git hash commit (git log -1):
commit be9e00fa7aeba432690901d54509760eb9ebfba4 (HEAD -> develop, origin/develop, origin/HEAD)
Author: Dmitry Kruchinin <[email protected]>
Date:   Wed Apr 21 14:47:58 2021 +0300

    Update cypress test. Canvas 3D functionality. Basic actions. (#3112)
    
    * Rename, add css class
    
    * Update cypress test.

  • Docker version docker version (e.g. Docker 17.0.05): 20.10.05
  • Are you using Docker Swarm or Kubernetes? Nope
  • Operating System and version (e.g. Linux, Windows, MacOS): GNU/Linux Ubuntu 4.15.0-140-generic

nstolyarov avatar Jun 16 '21 15:06 nstolyarov

In most formats, the dumped files have the same names as they had in the source CVAT task. Which formats did you export task to? You can find image names in the annotation window of the CVAT task.

CVAT only allows to navigate to a specific frame index. If you want to get image index in CVAT, you can do one of: a) get it from the dumped data, if the output format includes such information b) sort image paths / names lexicographically and get the index c) export in CVAT for images / CVAT for video / Datumaro and find "frame index" there

Please, describe, what you want to do more precise, so I could help you better.

@bsekachev, adding navigation by image names could be useful.

zhiltsov-max avatar Jun 17 '21 15:06 zhiltsov-max

You can see the current image name in CVAT near frame navigation elements.

Screenshot from 2021-06-17 22-42-45

bsekachev avatar Jun 17 '21 19:06 bsekachev

Speaking about the database, you can see a mapping there if you have an access to it. But be very careful when working directly with the database.

docker exec -it cvat_db /bin/bash
/usr/local/bin/createuser -s postgres # if you get error that postgres role does not exist.
psql cvat --user postgres

For the task with ID 9:

SELECT engine_image.frame, engine_image.path from engine_task INNER JOIN engine_data on engine_task.data_id=engine_data.id INNER JOIN engine_image on engine_image.data_id = engine_data.id where engine_task.id=9;

Screenshot from 2021-06-17 23-03-55

bsekachev avatar Jun 17 '21 20:06 bsekachev

From the UI point of view, I would suggest adding a feature to search a frame number by its name. Would it be a convenient solution for users in your opinion?

bsekachev avatar Jun 17 '21 20:06 bsekachev

Hi @zhiltsov-max and @bsekachev. Thank you for your answers.

I will try to give a clear example.

Suppose I have a full path to the file like "FULL/PATH/image.jpg". And I even know the task name / id where is it (but maybe not). How can I find the job id and frame id for this image in CVAT?

It would be useful if I had info like the following:

TASK_ID JOB_ID FRAME_ID IMG_PATH
111 26 874 full/path/to/image.jpg
103 13 234 full/path/to/another_image.jpg

Is there a possibility to get it from CVAT?

I need this in case when I do some operations with annotations (using Semantic mask 1.1 e.g.) and then I need to fix the concrete image's annotation.

UPDATE

I've tried the following command in cvat_db

SELECT * from engine_task INNER JOIN engine_data on engine_task.data_id=engine_data.id INNER JOIN engine_image on engine_image.data_id = engine_data.id;

Am I right that

  • stop frame is the last frame in the task?
  • frame is the frame id for this task?
  • id is image id for the whole CVAT?

nstolyarov avatar Jun 18 '21 08:06 nstolyarov

@nstolyarov

stop frame is the last frame in the task?

Not exactly. stop_frame is the latest frame in a job. A number of frames in a task: engine_task.size

frame is the frame id for this task?

I would say it is a frame number for this task.

id is image id for the whole CVAT?

I am not sure I understand you. engine_image.id is a primary key in the database, so, it is unique for the CVAT instance.

Generally speaking, a frame can be included into two jobs (if an overlap is enabled). You can see a range of frames for a specific job on the task page: Screenshot from 2021-06-18 11-56-05

bsekachev avatar Jun 18 '21 08:06 bsekachev

@bsekachev

Not exactly. stop_frame is the latest frame in a job. A number of frames in a task: engine_task.size

It is strange because in the task with 35 jobs I have segment_size=20, stop_frame=680 and size=681 for every data in the table.

But nevertheless seems that this is realy what I need.

Thank you very much for your help.

nstolyarov avatar Jun 18 '21 09:06 nstolyarov

It is strange because in the task with 35 jobs I have segment_size=20, stop_frame=680 and size=681 for every data in the table.

Sounds really strange. This is a piece of the table engine_segment: Screenshot from 2021-06-18 12-49-16

You can see here start_frame and stop_frame fields are different for the same task_id field.

bsekachev avatar Jun 18 '21 09:06 bsekachev

I would find it useful if when I exported annotations I could get a list of the image name (the file path) and the image number (2) in the example image I tried to upload. So if I have someone annotating images and there are some with issues I can have her record the number of the image with the issue and I can exclude it from my dataset. So for this example, I'd have a table with a row that has:
bristlecone2.PNG, 2 Is this available? It sounds like this is what @nstolyarov is asking, but I'm not sure.

image

mikeyEcology avatar Jun 24 '21 12:06 mikeyEcology

@mikeyEcology this is in the engine_image table.

select path, frame from engine_image will give you that. Just make sure to realize that # (frame number) is replicated across tasks. For example, Task A will have a frame 2 and Task B will have a frame 2.

MattWittbrodt avatar Sep 29 '21 18:09 MattWittbrodt

Try this:

create view task_job_frame as SELECT distinct ep.id as project_id, s.task_id as task_id, j.id AS job_id, s.frame_id, ei.path FROM engine_job j INNER JOIN ( SELECT id, task_id, generate_series(start_frame, stop_frame) AS frame_id FROM engine_segment ) s ON j.segment_id = s.id inner join engine_image ei on ei.frame = s.frame_id and ei.data_id = s.task_id inner join engine_task et on s.task_id = et.id inner join engine_project ep on et.project_id = ep.id ;

avengersassemble avatar Apr 22 '24 16:04 avengersassemble

@avengersassemble @MattWittbrodt Is there any way to get the above information using cvat-cli or api? I have a single task which is divided into multiple non-overlapping jobs. Annotators have not done certain jobs and in those jobs that they have finished, there are some corrupt images (no annotations). I want to distinguish the images which are corrupt and the ones which have not been annotated yet. I do not have access to cvat-db and looking for a solution using api/cli. Can anyone help?

Jain-Archit avatar Jul 02 '24 16:07 Jain-Archit

Hi, please check if the get_meta() and get_frames_info() methods of Task and Job in high-level SDK are useful. Example 1, example 2, more complex example 3 with lower level API.

zhiltsov-max avatar Jul 02 '24 17:07 zhiltsov-max