VQA_ReGAT
VQA_ReGAT copied to clipboard
The semantic_embedding and spatic_embedding types.
Hi, it is a great work for VQA. I did't download the datasets. So I want to konw the types of semantic_embedding and spatic_embedding, are they one-hot embedding or word embedding or extract features from model? I'm looking forward you reply, thanks!
Hi, thanks for your interests in this work and sorry for the late reply. They are one-hot embeddings.
Dear scholar,
Did your code pos_embedding.py show the same type id num like the below picture?
Dear scholar, This is your code in pos_emb.py
y_diff = center_y[i] - center_y[j] x_diff = center_x[i] - center_x[j] diag = math.sqrt((y_diff)**2 + (x_diff)**2) if diag < 0.5 * image_diag: sin_ij = y_diff/diag cos_ij = x_diff/diag if sin_ij >= 0 and cos_ij >= 0: label_i = np.arcsin(sin_ij) label_j = 2*math.pi - label_i elif sin_ij < 0 and cos_ij >= 0: label_i = np.arcsin(sin_ij)+2*math.pi label_j = label_i - math.pi elif sin_ij >= 0 and cos_ij < 0: label_i = np.arccos(cos_ij) label_j = 2*math.pi - label_i else: label_i = -np.arccos(sin_ij)+2*math.pi label_j = label_i - math.pi adj_matrix[i, j] = int(np.ceil(label_i/(math.pi/4)))+3 adj_matrix[j, i] = int(np.ceil(label_j/(math.pi/4)))+3But I think if obey the below picture type id num , maybe the following
if sin_ij >= 0 and cos_ij >= 0:# j is in the second Quadrant, i is the reference center label_i = math.pi - np.arcsin(sin_ij) label_j = 2*math.pi - np.arcsin(sin_ij) print(math.degrees(label_i)) print(math.degrees(label_j)) elif sin_ij < 0 and cos_ij >= 0:#j is in the third Quadrant, i is the reference center label_i = -np.arcsin(sin_ij)+math.pi label_j = np.arccos(cos_ij) print(math.degrees(label_i)) print(math.degrees(label_j)) elif sin_ij >= 0 and cos_ij < 0: #j is in the first Quadrant, i is the reference center label_i = np.arcsin(sin_ij) label_j = math.pi + np.arcsin(sin_ij) print(math.degrees(label_i)) print(math.degrees(label_j)) else:# j is in the fourth Quadrant, i is the reference center label_i = np.arcsin(sin_ij)+2*math.pi label_j = math.pi + np.arcsin(sin_ij) print(math.degrees(label_i)) print(math.degrees(label_j)) adj_matrix[i, j] = int(np.ceil(label_i/(math.pi/4)))+3 adj_matrix[j, i] = int(np.ceil(label_j/(math.pi/4)))+3
For spatial relations, as we do not use their semantic meaning during graph attention. The order of the labels do not matter. But you are right, the labels are not exactly the same as the ones in the pictures.
![]()
![]()
0: wearing, 1: holding, 2: sitting on, 3: standing on, 4: riding, 5:eating, 6:hanging from, 7:carrying, 8:attached to, 9: walking on, 10: playing, 11:covering, 12: lying on, 13:watching, 14:looking at the relation is 4: riding, 10: playing I think it must be my error but I don't know where the error is .
Remember that our semantic relation labels are predictions from a neural network, so the labels are not ground truth labels, which means there are very likely mistakes made in predictions. Also, can you remind me where did you get the label to relation mapping? It has been a while since I worked on this project, just want to make sure that we are on the same page.
0: wearing,
1: holding,
2: sitting on,
3: standing on,
4: riding,
5:eating,
6:hanging from,
7:carrying,
8:attached to, 9: walking on,
10: playing,
11:covering,
12: lying on, 13:watching,
14:looking at
the relation is 4: riding, 10: playing I think it must be my error but I don't know where the error is .