stark icon indicating copy to clipboard operation
stark copied to clipboard

STaRK-Prime answers wrong?

Open LacombeLouis opened this issue 7 months ago • 4 comments

During my exploration of the STaRK-Prime dataset, I looked into a few questions (human-generated ones specifically). I've discovered a couple of answers that I find strange, where the answer to the question is the topic entity.

For example, check question index 47 for the STaRK-Prime dataset (human-generated): "What diseases is exposure to 2,3',4,4',5-pentachlorobiphenyl associated with?", the answer ID is 61686. The name of the node 61686 is "2,3',4,4',5-pentachlorobiphenyl", which is already mentioned in the question. I also experience the same type of result for the question index 62.

Is this the behavior that is expected, and if so, could you explain why, as I would have expected to have responses that differ from the topic entity (especially in the human-generated).

You can re-create this by running the following code:

from stark_qa import load_qa, load_skb

dataset_name = 'prime'

qa_dataset = load_qa(dataset_name, human_generated_eval=True)
idx_split = qa_dataset.get_idx_split()

skb = load_skb(dataset_name, download_processed=False, root='.')

qa_dataset[47]
# Output
("What diseases is exposure to 2,3',4,4',5-pentachlorobiphenyl associated with?",
 47,
 [61686],
 None)

print(skb.get_doc_info(61686, add_rel=True))
# Output
- name: 2,3',4,4',5-pentachlorobiphenyl
- type: exposure
- source: CTD
- relations:
  parent-child: {exposure: (2,2',3',4,4',5-hexachlorobiphenyl, 2,4,4',5-tetrachlorobiphenyl, Endocrine Disruptors, Environmental Pollutants, Pesticides, Polychlorinated Biphenyls, 2,2',3,3',4,4',5-heptachlorobiphenyl, 2,3,3',4,4',5-hexachlorobiphenyl, 2,4,5,2',4',5'-hexachlorobiphenyl, Hydrocarbons, Chlorinated, Organic Chemicals, Thyroxine, Triiodothyronine),}
  interacts_with: {gene/protein: (TSHB, SERPINA7),biological_process: (thyroid hormone metabolic process, cognition, regulation of thyroid-stimulating hormone secretion, production of molecular mediator of immune response, regulation of bone mineralization, hypermethylation of CpG island, male meiosis chromosome separation),}
  linked_to: {disease: (osteoporosis, metabolic syndrome X, non-Hodgkin lymphoma, respiratory tract infectious disease, fatty liver disease, colorectal neoplasm),}

LacombeLouis avatar Jul 03 '24 11:07 LacombeLouis