relbench
relbench copied to clipboard
Why is `input_id` not contained by `n_id`?
Hi, thanks for sharing the benchmark.
In rel-attendance data, what is the relationship between input_id and n_id in the heterogenous graph sampled by NeighborLoader?
From my understanding, 'input_id' corresponds to the global index of the input_nodes, while n_id corresponds to the global node index for every sampled node. Therefore, n_id should contain 'input_id'. However, when I printed them out, this is not the case.
Can you please give me some hints to understand this?
@rusty1s can you please take a look?
input_id refers to the ID of the training table, i.e., the row of the training table the example subgraph was sampled from.
input_idrefers to the ID of the training table, i.e., the row of the training table the example subgraph was sampled from.
Matthias, thanks for your immediate reply! I have two more questions.
(1) As input_id refers to the training table's ID, can we access this input_id for other tables instead of only entity_table? It seems in rel-attendance, input_id only exists for users.
(2) I am confused about the GNN model forward function. As you directly slice x_dict[entity_table][: seed_time.size(0)] for final output prediction, does it mean that the first seed_time.size(0) entities of the entity_table are the target entities that corresponds to the seed time?
Look forward to your response!
- it only exists for the entity table since this is the seed table to start sampling from.
- Yes