VQA_ReGAT icon indicating copy to clipboard operation
VQA_ReGAT copied to clipboard

The semantic_embedding and spatic_embedding types.

Open haoopan opened this issue 4 years ago • 4 comments

Hi, it is a great work for VQA. I did't download the datasets. So I want to konw the types of semantic_embedding and spatic_embedding, are they one-hot embedding or word embedding or extract features from model? I'm looking forward you reply, thanks!

haoopan avatar Jan 15 '21 14:01 haoopan

Hi, thanks for your interests in this work and sorry for the late reply. They are one-hot embeddings.

linjieli222 avatar Jan 21 '21 05:01 linjieli222

image Dear scholar, Did your code pos_embedding.py show the same type id num like the below picture?

alice-cool avatar Apr 15 '21 02:04 alice-cool

Dear scholar, This is your code in pos_emb.py

                    y_diff = center_y[i] - center_y[j]
                    x_diff = center_x[i] - center_x[j]
                    diag = math.sqrt((y_diff)**2 + (x_diff)**2)
                    if diag < 0.5 * image_diag:
                        sin_ij = y_diff/diag
                        cos_ij = x_diff/diag
                        if sin_ij >= 0 and cos_ij >= 0:
                            label_i = np.arcsin(sin_ij)
                            label_j = 2*math.pi - label_i
                        elif sin_ij < 0 and cos_ij >= 0:
                            label_i = np.arcsin(sin_ij)+2*math.pi
                            label_j = label_i - math.pi
                        elif sin_ij >= 0 and cos_ij < 0:
                            label_i = np.arccos(cos_ij)
                            label_j = 2*math.pi - label_i
                        else:
                            label_i = -np.arccos(sin_ij)+2*math.pi
                            label_j = label_i - math.pi
                        adj_matrix[i, j] = int(np.ceil(label_i/(math.pi/4)))+3
                        adj_matrix[j, i] = int(np.ceil(label_j/(math.pi/4)))+3

But I think if obey the below picture type id num , maybe the following

                        if sin_ij >= 0 and cos_ij >= 0:# j is in the second Quadrant, i is the reference center
                            label_i = math.pi - np.arcsin(sin_ij)
                            label_j = 2*math.pi - np.arcsin(sin_ij)
                            print(math.degrees(label_i))
                            print(math.degrees(label_j))
                        elif sin_ij < 0 and cos_ij >= 0:#j is in  the third Quadrant, i is the reference center
                            label_i = -np.arcsin(sin_ij)+math.pi
                            label_j = np.arccos(cos_ij)
                            print(math.degrees(label_i))
                            print(math.degrees(label_j))
                        elif sin_ij >= 0 and cos_ij < 0: #j is in the first Quadrant, i is the reference center
                            label_i = np.arcsin(sin_ij)
                            label_j = math.pi + np.arcsin(sin_ij)
                            print(math.degrees(label_i))
                            print(math.degrees(label_j))
                        else:# j is in the fourth Quadrant, i is the reference center
                            label_i = np.arcsin(sin_ij)+2*math.pi
                            label_j = math.pi + np.arcsin(sin_ij)
                            print(math.degrees(label_i))
                            print(math.degrees(label_j))
                        adj_matrix[i, j] = int(np.ceil(label_i/(math.pi/4)))+3
                        adj_matrix[j, i] = int(np.ceil(label_j/(math.pi/4)))+3

For spatial relations, as we do not use their semantic meaning during graph attention. The order of the labels do not matter. But you are right, the labels are not exactly the same as the ones in the pictures.

linjieli222 avatar Apr 15 '21 17:04 linjieli222

image image image 0: wearing, 1: holding, 2: sitting on, 3: standing on, 4: riding, 5:eating, 6:hanging from, 7:carrying, 8:attached to, 9: walking on, 10: playing, 11:covering, 12: lying on, 13:watching, 14:looking at the relation is 4: riding, 10: playing I think it must be my error but I don't know where the error is .

Remember that our semantic relation labels are predictions from a neural network, so the labels are not ground truth labels, which means there are very likely mistakes made in predictions. Also, can you remind me where did you get the label to relation mapping? It has been a while since I worked on this project, just want to make sure that we are on the same page.

linjieli222 avatar Apr 15 '21 18:04 linjieli222