stark
stark copied to clipboard
STaRK-Prime answers wrong?
During my exploration of the STaRK-Prime dataset, I looked into a few questions (human-generated ones specifically). I've discovered a couple of answers that I find strange, where the answer to the question is the topic entity.
For example, check question index 47 for the STaRK-Prime dataset (human-generated): "What diseases is exposure to 2,3',4,4',5-pentachlorobiphenyl associated with?", the answer ID is 61686
. The name of the node 61686
is "2,3',4,4',5-pentachlorobiphenyl", which is already mentioned in the question. I also experience the same type of result for the question index 62.
Is this the behavior that is expected, and if so, could you explain why, as I would have expected to have responses that differ from the topic entity (especially in the human-generated).
You can re-create this by running the following code:
from stark_qa import load_qa, load_skb
dataset_name = 'prime'
qa_dataset = load_qa(dataset_name, human_generated_eval=True)
idx_split = qa_dataset.get_idx_split()
skb = load_skb(dataset_name, download_processed=False, root='.')
qa_dataset[47]
# Output
("What diseases is exposure to 2,3',4,4',5-pentachlorobiphenyl associated with?",
47,
[61686],
None)
print(skb.get_doc_info(61686, add_rel=True))
# Output
- name: 2,3',4,4',5-pentachlorobiphenyl
- type: exposure
- source: CTD
- relations:
parent-child: {exposure: (2,2',3',4,4',5-hexachlorobiphenyl, 2,4,4',5-tetrachlorobiphenyl, Endocrine Disruptors, Environmental Pollutants, Pesticides, Polychlorinated Biphenyls, 2,2',3,3',4,4',5-heptachlorobiphenyl, 2,3,3',4,4',5-hexachlorobiphenyl, 2,4,5,2',4',5'-hexachlorobiphenyl, Hydrocarbons, Chlorinated, Organic Chemicals, Thyroxine, Triiodothyronine),}
interacts_with: {gene/protein: (TSHB, SERPINA7),biological_process: (thyroid hormone metabolic process, cognition, regulation of thyroid-stimulating hormone secretion, production of molecular mediator of immune response, regulation of bone mineralization, hypermethylation of CpG island, male meiosis chromosome separation),}
linked_to: {disease: (osteoporosis, metabolic syndrome X, non-Hodgkin lymphoma, respiratory tract infectious disease, fatty liver disease, colorectal neoplasm),}