ShanghaiTechDataset
ShanghaiTechDataset copied to clipboard
How to add additional data/images to Shanghaitech A and B ?
Hi !
Can you advise How to add additional data/images to Shanghaitech A and B ? What is the process of labelling the density maps ?
In the paper of CVPR 2016, the generation of density map was explained.The address of the paper is given in the Readme of the repository.
in brief. You need to know the position of the head first. Next, perform a Gaussian kernel convolution on the image.
@KunMengcode hi there! Yep I saw the paper and the codes. I have an idea of how to do it now
Next, i'm unsure how the original authors arrived at a gaussian "blur" of 5 for each head.
Which method are you talking about?
Generally/most methods should be similar, taking crowdcount-mcnn as an example As you can see, it only carries out Gauss . I think the additional method proposed in the paper has little impact on the results. In paper 2.1, two methods of density map are proposed, and which method to choose is determined. In paper 2.2, we first explain the basic density map generated by the head position just mentioned, and then carry out Gaussian kernel convolution on this basis. (Popular understanding: if you directly take the marked pictures to train, each pixel value is either 0 or 1, It's like you have eaten a whole cake by yourself, but the people around you can't eat it at all. Is it too extreme. And this model learning will be more difficult. Therefore, we use a method [Gaussian convolution kernel function], which can slightly "share" the pixel value where the pixel value is 1 to all other pixel values, and each pixel point has, Let everyone enjoy part of it) However, such a result will make everyone's head and the head in the surrounding position independent of each other. It is obvious that such a situation will occur: under the interference of different perspectives and other samples, a person's head pixel xi in the crowd will often correspond to different shapes of different sizes. (The head near is larger and the head far is smaller). Because the dataset is only a picture, it does not contain information such as spatial location. Therefore, we need to consider the interference of homography matrix caused by pictures.
According to the size of each head in the picture, choose the appropriate sigma. The paper mentioned that the size of the head is often related to the distance between the two adjacent heads. Therefore, the title of paper 2.2<Density map via geometry-adaptive kernels>puts forward a data-adaptive method, that is, determine the expansion parameters by the average distance between a person's head and the head around him. KNN is used here. So we have the following formula
It was found in many tests that, β The density map generated with 0.3 is the best
The closer the Gaussian distribution is to the input point, the stronger the correlation is, so the closer the pixel value is "1", the more values those pixels will enjoy. If there is an image matrix of [3,3], the value of the most central pixel point is "1", and the value of other pixel points is all 0, then the values assigned to the upper, lower, left and right pixels are equal, because their distance from the central point is 1 pixel unit, and the assigned value is assumed to be a. The values assigned to the upper left, lower left, upper right and lower right pixels must also be equal, because their distance from the center point is √ 2 pixel units, assuming that the assigned value is b. When the center points divide "1" to other places, the rest is the value left by the center point itself, which is assumed to be c. Therefore, we can know that the sum of nine pixel values is 1, that is, 4a+4b+c=1, and c>a>b. But how big is the difference between a and b? It is possible that a=0.1 and b=0.05, or a=0.1 and b=0.025, that is to say, the same distance between a and b is "1". The difference between the distance of this cake is √ 2 -1. The size of the distribution cake caused by this difference is also different. This gap is determined by Gaussian function σ Determined: σ The larger the difference is, the smaller the difference is. Therefore, it is determined by the distance between human heads σ It is reasonable. The distribution difference of the head with larger pixel area should be larger, so as to highlight the characteristics of the density map.
Adaptive σ It is the expansion on this basis