CubiCasa5k Dataset contains unscaled images which leads to wrong annotation data

Dataset contains unscaled images which leads to wrong annotation data

Open cansik opened this issue 3 years ago • 7 comments

As I understand the dataset, there is always an original image of plan as well as a scaled version. The scaled version can have larger dimensions then the svg, but it is scaled to match the model.svg annotation.

Through parsing the plans myself, I found out that some of them are not scaled accordingly. For example high_quality_architectural/20107 is such a case (and there are a some more).

Both png (original and scaled) are the same size: w: 485px h: 430px While the svg document has a size of w: 1066px h: 744px

Even just resizing the png to the svg won't help, because the svg is a bit larger then the matching scaling as you see here:

upscale png to svg size

upscale png to fit svg floorplan

Jul 19 '20 18:07 cansik

Thanks for this find. Do you have any additional file paths with this same issue, I could check them out. Taking in account the data set size, this should not impact in larger scale training that much.

Jul 20 '20 14:07 ccmarkus

It seems there are way less than I thought in the beginning:

high_quality_architectural/20107/F1_scaled.png
high_quality_architectural/4884/F1_scaled.png
high_quality_architectural/8002/F1_scaled.png

But it seems that there are some rather strange annotations (more is annotated than on the visible plan). While this one seems fine, it gives the door a normalised location value over 1.0, which leads to errors in further ML pipelines (for example YOLO):

The black border is from the image tool.

And here we see even more annotated the on the original plan. This happens on about 2 or three more plans (did not count yet) and can also lead to bugs in further image-classification algorithms:

Jul 21 '20 14:07 cansik

I ran into the same issue whilst trying to benchmark with other algorithms. I'm currently in the process of trying to convert the dataset into COCO format for this to be compatible also with different algorithms.

Jul 24 '20 09:07 anthonymakela

I have already written a script for that, but it is quite painful because you have to interpret svg path descriptions and so on.

Jul 24 '20 10:07 cansik

@cansik, would you be willing to share your script? edit: or share the resulting annotation files?

Also, we meet again! (pyk4a)

Sep 14 '22 20:09 lpasselin

Hello @cansik did you convert annotation to coco format? Is it possible to share the script or converted data with coco format?

Feb 25 '24 23:02 Laudarisd

@lpasselin @Laudarisd I have uploaded my script as a gist here. It expects the dataset in the ./data folder, but you can change that in main. Also it is possible to add more objects that can be detected to the head of the file, currently only Door is active.

This script have been written without the intention to make it public, so please don't expect it to work or be simple to navigate.

Feb 26 '24 15:02 cansik

CubiCasa5k CubiCasa5k copied to clipboard

Dataset contains unscaled images which leads to wrong annotation data

CubiCasa5k
CubiCasa5k copied to clipboard