STCrowd Please offer more information of annotation

@4DVLab Thanks for your work. Is there any detailed explanation of annotation ? I read the paper as well as supplemental, but there still are many confusing question to be solved : ( maybe the homepage explained everything but it's unavailable for now. Questions are listed below:

[ ] "scene_class" is a int label set from [1,2,3,5,6,7,8,9]. why is 4 missing, and what do these number mean specifically? Different weather condition or different location? Can you offer some description about every scene_calss?
[ ] What does "group" "size category" and "isValid" mean?
[ ] Which values does the label category to take? For now I just see "category"="person". Why not just offer a clear value list in plain sight?
[ ] Since the item's "id" indicate unique identification，why does "number" exists? And Why is the "number" value sometimes an integer, sometimes an ID string?
[ ] In the point cloud annotations, item who is serious occluded is not labeled, therefore all information including world position is missing. In the image annotations, only image coordinates(2D) are offered. Acoording the image coordinates(2D) and camera's extrinsics an intrinsics, I can't restore the missing item's world coordinates(3D).
[ ] In the "images" annotations, when "image_name"="right/1/xx.jpg", the "class", however, ="left", and there is no item. What does this mean? It seems that the right/xx.jpg and the left/xx.jpg are identical.
[ ] the number of the {num_sequence}.json does not seem to be what the paper said.

can you answer these questions? looking forward to your reply. P.S. Your README doesn't give much useful information, why not update and extend the README?

Oct 24 '22 08:10 NeoKH

most of the details are not very important for training ... (the code for generating pkl will gives you a clearer version) 1.scene_class is just for me to divide training/validation/testing set( scene_class=1,9 is used for training ) , it is decided by different location, and 8 is a special group including the rainy scenes (since in rainy cases, the point number in LiDAR is different). 2."group", "size category", "isValid" these are not used... 3. our paper is dealing with crowded detection, so category is only person. 4. "id" is enough for tracking 5. our evaluation is based on 3D detection with labeled persons, and the 2D labels which only in images are redundant for 2D detection task. 6. we actually collect the data with ZED so we release left and right camera but only labels on left images. 7. this num_sequence is the sequence group number ( one json for a continuous sequence which is include 50~800 frames ) , and the total frames number is correct.

Thanks for your questions and I will update the README including more information!

Oct 25 '22 16:10 coralemon

most of the details are not very important for training ... (the code for generating pkl will gives you a clearer version) 1.scene_class is just for me to divide training/validation/testing set( scene_class=1,9 is used for training ) , it is decided by different location, and 8 is a special group including the rainy scenes (since in rainy cases, the point number in LiDAR is different). 2."group", "size category", "isValid" these are not used... 3. our paper is dealing with crowded detection, so category is only person. 4. "id" is enough for tracking 5. our evaluation is based on 3D detection with labeled persons, and the 2D labels which only in images are redundant for 2D detection task. 6. we actually collect the data with ZED so we release left and right camera but only labels on left images. 7. this num_sequence is the sequence group number ( one json for a continuous sequence which is include 50~800 frames ) , and the total frames number is correct.

Thanks for your questions and I will update the README including more information!

Dear coralemon, I have a question. I downloaded data, and it does not include annotations for the test set. So, I decided to take a portion from the validation set as the test set. [1, 9] for the train set, [2,3] for the validation set, [5,6,7,8] for the test set. What do you think about it? I am not sure about the balance between the train, validation, and test sets.

I am very appreciative to receive suggestions from you.

Jan 06 '24 01:01 nghiabka

We will refrain from releasing the test annotations to potentially use them for a future challenge. Regarding the division of data, since the numbers are closely tied to the scene and its density, I recommend implementing a random division approach. This will result in a more consistent distribution of scenes for the test and validation sets, with special consideration for Scene 7, which has the highest density.

Jan 06 '24 05:01 4DVLab

We will refrain from releasing the test annotations to potentially use them for a future challenge. Regarding the division of data, since the numbers are closely tied to the scene and its density, I recommend implementing a random division approach. This will result in a more consistent distribution of scenes for the test and validation sets, with special consideration for Scene 7, which has the highest density.

Thank you for your reply. You mean that keep [1,9] scenes for training, and use random strategy to divide [2, 3,5,6,7,8] scenes for val set and test set. Is it right? Sorry, I don’t find any document describing detail for each index scene. Can you give more information about these scenes? Thank you very much

Jan 06 '24 08:01 nghiabka

The scene are just the different capture locations and you can check from the images.

Jan 08 '24 07:01 4DVLab