pix2pixHD
pix2pixHD copied to clipboard
Clarification question: Instance mapping appears to be missing
Hi,
I'm trying to understand how instance boundary maps are used by your code to improve the synthesized output of the ign-G.
This excerpt is from section 3.3 of your paper and is very clear. I agree with it as well.
"Instead, we argue that the most important information the instance map provides, which is not available in the semantic label map, is the object boundary. For example, when a number of same-class objects are next to one another, looking at the semantic label map alone cannot tell them apart. This is especially true for the street scene since many parked cars or walking pedestrians are often next to one another, as shown in Fig. 3a. However, with the instance map, separating these objects becomes an easier task."
I was able to successfully run your code and synthesize output as expected. However, I am confused when I look deeper into the inputs of the examples provided to the ign-G.
The instance boundary maps (found in ./datasets/cityscapes/test_inst) don't appear to provide boundary information. For example, frankfurt_000001_047552_gtFine_instanceIds.png (below) doesn't define boundaries of the vehicles parked on either side of the street.

In other examples, boundary mapping appears but only for a small part of the image (frankfurt_000001_054640_gtFine_InstanceIds.png). In the below image, the red box shows boundary mapping but not consistently throughout the image.

I used GIMP to inspect the hex color codes to make sure there is no tiny variation that my eyes cannot detect. I used this technique to inspect the label map which contains overt and subtle color labeling distinctions.
Is this because that file is not an instance boundary map? If so, is this file the concatenation of the one hot vector representation of the semantic label map and the boundary map? If not, is it the channel wise concat of the instance boundary map, semantic label map, and the real image?
They are "_labelIds" and "_instanceIds" and this is why I am confused.
Please help me clear up my confusion. Because otherwise, wouldn't the ign-G considers these groups of cars and people to be a single object during synthesis?
Thank you for sharing your hard work. I really am enjoying experimenting with it.
The instance IDs for different cars should be different. Please note that the IDs are greater than 256 (e.g. 26001, 26002, ...), so please keep that in mind when reading in the image.
Thanks for responding. Can you tell me where in the image you are defining the IDs? I cannot locate any values that match your description.
The IDs are in *_InstanceIds.png. The way Cityscapes encodes the instance IDs is as follows: For classes that have instances labeled (e.g. cars and pedestrians), the instance ID divided by 1000 is the class ID, while the remainder when divided by 1000 is the ID for the particular instance. For example, 26003 means it's a car (label 26) and of ID 003. For classes that don't have instance labeled (e.g. trees), the instance ID is just the same as the label ID.
I appreciate the feedback, but I don't see any 4 or 5 digit label info embedded in the pngs anywhere. I am not able to get the cityscapes dataset for comparison so I'm only looking at the images included in the /datasets/cityscapes/*inst folders. They look like the images I posted above. Those colors don't match your example format. GIMP tells me they are hex 656565 and pixel value 101.

I don't know where in your code or what tool you use to inspect and find those instance IDs, but I'm pretty sure I'm not seeing what you're seeing.
I tried using my own images to test the netG checkpoint, but I can't segment and label my images according to how your code expects them. Perhaps I could build a segnet into the front of your code so that future users don't have to worry about this problem and can just provide raw images for testing and training?
I am trying this with my own dataset and am also finding the prep needed and necessary encoding to be unclear. A segnet in front would be absolutely fabulous!
Hi @tcwang0509, thanks for all the hard work, very impressive.
I think the confusion is created by the mismatched data between the paper and the Github example.
What format should we use?
Thanks
@codeisnotcode do you have instance maps with your own dataset? If not, you can just specify '--no_instance'. @aviel08 the colorful label maps are just for representation. When feeding to the network, you need to have images similar to the github files.
@hoomanNo5 The cityscapes instanceIds images are encoded by InstanceID. In short, firstly, different class of objects have different pixel values, besides, different entities of the same class also have different pixel values. Therefore, according to the article, you can write your own code fllow the article principle to generate boundray map. I simply tried it, result is fllows:
InstanceIds.png

the pixel value range

boundary map

@CarryJzzZ Hi,I think your boundary map very good! Can I ask how you do this,using which kind of algorithm.Bacause I use many edge detection algorithms even can‘t do this effect.I will be very thankful if you can share it!
@CarryJzzZ Thank you for making it clear. Just one question: How do you read the instance? I read it using opencv as usual and my maximum value is 127. I believe I am reading it not correctly, so could you please share how did you get the max pixel_value 33001?
Finally I found it! @hoomanNo5 You just need to read the image using OpenCV with option cv2.IMREAD_UNCHANGED or cv2.IMREAD_ANYDEPTH. Like this:
inst = cv2.imread(join(path_inst, inst_name), cv2.IMREAD_ANYDEPTH)
And you will see the true value of the pixels. Otherwise it will compress the value to int [0-255] and obviously you will see all cars have the same pixel value.
I have been spending my whole week to generate the instance map and couldn't understand why it didn't work. It turns out that I didn't correctly understand how the instance map was created.
@594cp @doantientai I would like to use pillow to do image processing. For this issue, I just follow the method used by this paper, maybe it is simple with some bugs, you can debug it, and find the best way to get with it.
def boundary(raw_input, save_path, save_name):
"""
calculate boundary mask & save
:param raw_input: *instanceIds image
:param save_path: city name
:param save_name: boundary mask name
:return:
"""
# process instance mask
instance_mask = Image.open(raw_input)
width = instance_mask.size[0]
height = instance_mask.size[1]
mask_array = np.array(instance_mask)
# define the boundary mask
boundary_mask = np.zeros((height, width), dtype=np.uint8) # 0-255
# perform boundary calculate: the center pixel_id is differ from the 4-nearest pixels_id
for i in range(1, height-1):
for j in range(1, width-1):
if mask_array[i, j] != mask_array[i - 1, j] \
or mask_array[i, j] != mask_array[i + 1, j] \
or mask_array[i, j] != mask_array[i, j - 1] \
or mask_array[i, j] != mask_array[i, j + 1]:
boundary_mask[i, j] = 255
boundary_image = Image.fromarray(np.uint8(boundary_mask))
# boundary_image.show()
boundary_image.save(os.path.join(RAW_INPUT_PATH, save_path, save_name))
@CarryJzzZ Thank you, that is really helpful!
Hi, I am wondering how can I get the instanceId images for my own datasets.
@zhangdan8962 have you found the answer? I have my own dataset and I created one channel instace maps but it gives me error, what is the shape of the instance maps? the same as the input?