ontogpt
ontogpt copied to clipboard
`named_entities` in `output.txt` contains all entities from previous documents when run on a directory
Related to the change introduced in #304.
For each new YAML output document appended to the output.txt
file, the extracted_object
item is correct (only contains information from the current input doc), but the named_entities
object is appended to from the previous document, and so accumulated entities that aren't in the input doc in question.
EDIT: Expected behavior: For the named_entities
item to only contain entities from the current doc.
A full example:
---
input_text: In tobacco, two mitogen-activated protein (MAP) kinases, designated salicylic
acid (SA)-induced protein kinase (SIPK) and wounding-induced protein kinase (WIPK)
are activated in a disease resistance-specific manner following pathogen infection
or elicitor treatment. To investigate whether nitric oxide (NO), SA, ethylene, or
jasmonic acid (JA) are involved in this phenomenon, the ability of these defense
signals to activate these kinases was assessed. Both NO and SA activated SIPK; however,
they did not activate WIPK. Additional analyses with transgenic NahG tobacco revealed
that SA is required for the NO-mediated induction of SIPK. Neither JA nor ethylene
activated SIPK or WIPK. Thus, SIPK may function downstream of SA in the NO signaling
pathway for defense responses, while the signals responsible for resistance-associated
WIPK activation have yet to be determined.
raw_completion_output: |-
genes: MAPK; SIPK; WIPK; NahG
proteins: salicylic acid-induced protein kinase; wounding-induced protein kinase
molecules: nitric oxide; salicylic acid; ethylene; jasmonic acid
organisms: tobacco
gene_gene_interactions:
gene_protein_interactions:
gene_organism_relationships:
protein_protein_interactions:
protein_organism_relationships:
gene_molecule_interactions:
protein_molecule_interactions:
label: mitogen-activated protein (MAP) kinases
prompt: |+
From the text below, extract the following entities in the following format:
genes: <A semicolon-separated list of genes.>
proteins: <A semicolon-separated list of proteins.>
molecules: <A semicolon-separated list of molecules.>
organisms: <A semicolon-separated list of taxonomic terms of living things.>
gene_gene_interactions: <A semicolon-separated list of gene-gene interactions.>
gene_protein_interactions: <A semicolon-separated list of gene-protein interactions.>
gene_organism_relationships: <A semicolon-separated list of gene-organism relationships.>
protein_protein_interactions: <A semicolon-separated list of protein-protein interactions.>
protein_organism_relationships: <A semicolon-separated list of protein-organism relationships.>
gene_molecule_interactions: <A semicolon-separated list of gene-molecule interactions.>
protein_molecule_interactions: <A semicolon-separated list of protein-molecule interactions.>
label: <The label (name) of the named thing>
Text:
In tobacco, two mitogen-activated protein (MAP) kinases, designated salicylic acid (SA)-induced protein kinase (SIPK) and wounding-induced protein kinase (WIPK) are activated in a disease resistance-specific manner following pathogen infection or elicitor treatment. To investigate whether nitric oxide (NO), SA, ethylene, or jasmonic acid (JA) are involved in this phenomenon, the ability of these defense signals to activate these kinases was assessed. Both NO and SA activated SIPK; however, they did not activate WIPK. Additional analyses with transgenic NahG tobacco revealed that SA is required for the NO-mediated induction of SIPK. Neither JA nor ethylene activated SIPK or WIPK. Thus, SIPK may function downstream of SA in the NO signaling pathway for defense responses, while the signals responsible for resistance-associated WIPK activation have yet to be determined.
===
extracted_object:
id: 6a86d066-3c07-4b2a-ae25-a1d62a587dda
label: mitogen-activated protein (MAP) kinases
genes:
- GO:0004707
- AUTO:SIPK
- AUTO:WIPK
- AUTO:NahG
proteins:
- AUTO:salicylic%20acid-induced%20protein%20kinase
- AUTO:wounding-induced%20protein%20kinase
molecules:
- CHEBI:16480
- CHEBI:16914
- CHEBI:18153
- CHEBI:18292
organisms:
- NCBITaxon:4097
named_entities:
- id: GO:0004707
label: MAPK
- id: AUTO:SIPK
label: SIPK
- id: AUTO:WIPK
label: WIPK
- id: AUTO:NahG
label: NahG
- id: AUTO:salicylic%20acid-induced%20protein%20kinase
label: salicylic acid-induced protein kinase
- id: AUTO:wounding-induced%20protein%20kinase
label: wounding-induced protein kinase
- id: CHEBI:16480
label: nitric oxide
- id: CHEBI:16914
label: salicylic acid
- id: CHEBI:18153
label: ethylene
- id: CHEBI:18292
label: jasmonic acid
- id: NCBITaxon:4097
label: tobacco
---
input_text: Recent evidence suggests that oxidized lipid-derived molecules play significant
roles in inducible plant defence responses against microbial pathogens, either by
directly deterring parasite multiplication, or as signals involved in the induction
of sets of defence genes. The synthesis of these oxylipins was hypothesized to be
initiated by the phospholipase A2-mediated release of unsaturated fatty acids from
membrane lipids. Here, we demonstrate that, in tobacco leaves reacting hypersensitively
to tobacco mosaic virus, a strong increase in soluble phospholipase A2 (PLA2) activity
occurs at the onset of necrotic lesion appearance. This rapid PLA2 activation occurred
before the accumulation of 12-oxophytodienoic and jasmonic acids, two fatty acid-derived
defence signals. Three PLA2 isoforms were separated and the most active enzyme was
partially purified, its N-terminal sequence displaying similarity with patatin,
the major storage protein in potato tubers. Three related tobacco patatin-like cDNAs,
called NtPat1, NtPat2 and NtPat3, were cloned, with NtPat2 encoding the PLA2 isolated
from infected leaves. RT-PCR experiments showed a rapid transcriptional activation
of the three NtPat genes in virus-infected leaves, preceding the increase in PLA2
activity. Recombinant NtPat1 and NtPat3 enzymes were active in an assay using labelled
bacterial membranes, and also displayed high bona fide PLA2 activity on phosphatidylcholine
substrate. These results point to a possible new role of patatin-like phospholipases
in inducible plant defence responses. The induction kinetics together with the enzymatic
activity data indicate that the NtPat proteins may provide precursors for oxylipin
synthesis during the hypersensitive response to pathogens.
raw_completion_output: |-
genes: NtPat1; NtPat2; NtPat3
proteins: phospholipase A2 (PLA2); patatin
molecules: 12-oxophytodienoic acid; jasmonic acid
organisms: tobacco; tobacco mosaic virus
gene_gene_interactions:
gene_protein_interactions: NtPat2 encodes the PLA2 isolated from infected leaves
gene_organism_relationships: rapid transcriptional activation of NtPat genes in virus-infected leaves
protein_protein_interactions:
protein_organism_relationships:
gene_molecule_interactions:
protein_molecule_interactions:
label: oxidized lipid-derived molecules
prompt: |+
From the text below, extract the following entities in the following format:
gene: <the value for gene>
organism: <the value for organism>
Text:
rapid transcriptional activation of NtPat genes in virus-infected leaves
===
extracted_object:
id: 8ea1b738-89ed-4b2b-b03d-92df6792a2c7
label: oxidized lipid-derived molecules
genes:
- AUTO:NtPat1
- AUTO:NtPat2
- AUTO:NtPat3
proteins:
- PR:000012798
- AUTO:patatin
molecules:
- CHEBI:15560
- CHEBI:18292
organisms:
- NCBITaxon:4097
- NCBITaxon:12242
gene_protein_interactions:
- gene: AUTO:NtPat2
protein: PR:000012798
gene_organism_relationships:
- gene: AUTO:NtPat
organism: AUTO:virus-infected%20leaves
named_entities:
- id: GO:0004707
label: MAPK
- id: AUTO:SIPK
label: SIPK
- id: AUTO:WIPK
label: WIPK
- id: AUTO:NahG
label: NahG
- id: AUTO:salicylic%20acid-induced%20protein%20kinase
label: salicylic acid-induced protein kinase
- id: AUTO:wounding-induced%20protein%20kinase
label: wounding-induced protein kinase
- id: CHEBI:16480
label: nitric oxide
- id: CHEBI:16914
label: salicylic acid
- id: CHEBI:18153
label: ethylene
- id: CHEBI:18292
label: jasmonic acid
- id: NCBITaxon:4097
label: tobacco
- id: AUTO:NtPat1
label: NtPat1
- id: AUTO:NtPat2
label: NtPat2
- id: AUTO:NtPat3
label: NtPat3
- id: PR:000012798
label: phospholipase A2 (PLA2)
- id: AUTO:patatin
label: patatin
- id: CHEBI:15560
label: 12-oxophytodienoic acid
- id: NCBITaxon:12242
label: tobacco mosaic virus
- id: AUTO:NtPat
label: NtPat
- id: AUTO:virus-infected%20leaves
label: virus-infected leaves
---
input_text: We conducted a study of the patterns and dynamics of oxidized fatty acid
derivatives (oxylipins) in potato leaves infected with the late-blight pathogen
Phytophthora infestans. Two 18-carbon divinyl ether fatty acids, colneleic acid
and colnelenic acid, accumulated during disease development. To date, there are
no reports that such compounds have been detected in higher plants. The divinyl
ether fatty acids accumulate more rapidly in potato cultivar Matilda (a cultivar
with increased resistance to late blight) than in cultivar Bintje, a susceptible
cultivar. Colnelenic acid reached levels of up to approximately 24 nmol (7 microgram)
per g fresh weight of tissue in infected leaves. By contrast, levels of members
of the jasmonic acid family did not change significantly during pathogenesis. The
divinyl ethers also accumulated during the incompatible interaction of tobacco with
tobacco mosaic virus. Colneleic and colnelenic acids were found to be inhibitory
to P. infestans, suggesting a function in plant defense for divinyl ethers, which
are unstable compounds rarely encountered in biological systems.
raw_completion_output: |-
genes: N/A
proteins: N/A
molecules: oxylipins; colneleic acid; colnelenic acid; jasmonic acid
organisms: Phytophthora infestans; tobacco mosaic virus
gene_gene_interactions: N/A
gene_protein_interactions: N/A
gene_organism_relationships: N/A
protein_protein_interactions: N/A
protein_organism_relationships: N/A
gene_molecule_interactions: N/A
protein_molecule_interactions: N/A
label: divinyl ether fatty acids
prompt: |+
Split the following piece of text into fields in the following format:
protein: <the name of the protein.>
molecule: <the name of the molecule.>
Text:
N/A
===
extracted_object:
id: a5add351-be1d-47b3-84ca-4c35cbf80c31
label: divinyl ether fatty acids
genes:
- AUTO:N/A
proteins:
- AUTO:N/A
molecules:
- CHEBI:61121
- CHEBI:60956
- CHEBI:60959
- CHEBI:18292
organisms:
- NCBITaxon:4787
- NCBITaxon:12242
gene_gene_interactions:
- gene1: AUTO:N/A
gene2: AUTO:N/A
gene_protein_interactions:
- gene: AUTO:N/A
protein: AUTO:N/A
gene_organism_relationships:
- gene: AUTO:N/A
organism: AUTO:N/A
protein_protein_interactions:
- protein1: AUTO:N/A
protein2: AUTO:N/A
protein_organism_relationships:
- gene: AUTO:N/A
organism: AUTO:N/A
gene_molecule_interactions:
- gene: AUTO:N/A
molecule: AUTO:N/A
protein_molecule_interactions:
- protein: AUTO:N/A
molecule: AUTO:N/A
named_entities:
- id: GO:0004707
label: MAPK
- id: AUTO:SIPK
label: SIPK
- id: AUTO:WIPK
label: WIPK
- id: AUTO:NahG
label: NahG
- id: AUTO:salicylic%20acid-induced%20protein%20kinase
label: salicylic acid-induced protein kinase
- id: AUTO:wounding-induced%20protein%20kinase
label: wounding-induced protein kinase
- id: CHEBI:16480
label: nitric oxide
- id: CHEBI:16914
label: salicylic acid
- id: CHEBI:18153
label: ethylene
- id: CHEBI:18292
label: jasmonic acid
- id: NCBITaxon:4097
label: tobacco
- id: AUTO:NtPat1
label: NtPat1
- id: AUTO:NtPat2
label: NtPat2
- id: AUTO:NtPat3
label: NtPat3
- id: PR:000012798
label: phospholipase A2 (PLA2)
- id: AUTO:patatin
label: patatin
- id: CHEBI:15560
label: 12-oxophytodienoic acid
- id: NCBITaxon:12242
label: tobacco mosaic virus
- id: AUTO:NtPat
label: NtPat
- id: AUTO:virus-infected%20leaves
label: virus-infected leaves
- id: AUTO:N/A
label: N/A
- id: CHEBI:61121
label: oxylipins
- id: CHEBI:60956
label: colneleic acid
- id: CHEBI:60959
label: colnelenic acid
- id: NCBITaxon:4787
label: Phytophthora infestans
---
input_text: The plant-signaling molecules salicylic acid (SA) and jasmonic acid (JA)
play an important role in induced disease resistance pathways. Cross-talk between
SA- and JA-dependent pathways can result in inhibition of JA-mediated defense responses.
We investigated possible antagonistic interactions between the SA-dependent systemic
acquired resistance (SAR) pathway, which is induced upon pathogen infection, and
the JA-dependent induced systemic resistance (ISR) pathway, which is triggered by
nonpathogenic Pseudomonas rhizobacteria. In Arabidopsis thaliana, SAR and ISR are
effective against a broad spectrum of pathogens, including the foliar pathogen Pseudomonas
syringae pv. tomato (Pst). Simultaneous activation of SAR and ISR resulted in an
additive effect on the level of induced protection against Pst. In Arabidopsis genotypes
that are blocked in either SAR or ISR, this additive effect was not evident. Moreover,
induction of ISR did not affect the expression of the SAR marker gene PR-1 in plants
expressing SAR. Together, these observations demonstrate that the SAR and the ISR
pathway are compatible and that there is no significant cross-talk between these
pathways. SAR and ISR both require the key regulatory protein NPR1. Plants expressing
both types of induced resistance did not show elevated Npr1 transcript levels, indicating
that the constitutive level of NPR1 is sufficient to facilitate simultaneous expression
of SAR and ISR. These results suggest that the enhanced level of protection is established
through parallel activation of complementary, NPR1-dependent defense responses that
are both active against Pst. Therefore, combining SAR and ISR provides an attractive
tool for the improvement of disease control.
raw_completion_output: |-
genes: NPR1; PR-1
proteins: NPR1
molecules: salicylic acid (SA); jasmonic acid (JA)
organisms: Arabidopsis thaliana; Pseudomonas rhizobacteria; Pseudomonas syringae pv. tomato (Pst)
gene_gene_interactions:
gene_protein_interactions: NPR1-PR-1
gene_organism_relationships:
protein_protein_interactions:
protein_organism_relationships:
gene_molecule_interactions:
protein_molecule_interactions:
label: salicylic acid; jasmonic acid; systemic acquired resistance; induced systemic resistance; NPR1; Pseudomonas syringae pv. tomato; PR-1; Arabidopsis thaliana
prompt: |+
Split the following piece of text into fields in the following format:
gene: <the name of the gene.>
protein: <the name of the protein.>
Text:
NPR1-PR-1
===
extracted_object:
id: 30dce43d-87c4-401a-b0b2-fe6c8d8092dd
label: salicylic acid; jasmonic acid; systemic acquired resistance; induced systemic
resistance; NPR1; Pseudomonas syringae pv. tomato; PR-1; Arabidopsis thaliana
genes:
- AUTO:NPR1
- AUTO:PR-1
proteins:
- PR:000011377
molecules:
- CHEBI:35962
- CHEBI:18292
organisms:
- NCBITaxon:3702
- AUTO:Pseudomonas%20rhizobacteria
- NCBITaxon:323
gene_protein_interactions:
- gene: AUTO:NPR1
protein: AUTO:PR-1
named_entities:
- id: GO:0004707
label: MAPK
- id: AUTO:SIPK
label: SIPK
- id: AUTO:WIPK
label: WIPK
- id: AUTO:NahG
label: NahG
- id: AUTO:salicylic%20acid-induced%20protein%20kinase
label: salicylic acid-induced protein kinase
- id: AUTO:wounding-induced%20protein%20kinase
label: wounding-induced protein kinase
- id: CHEBI:16480
label: nitric oxide
- id: CHEBI:16914
label: salicylic acid
- id: CHEBI:18153
label: ethylene
- id: CHEBI:18292
label: jasmonic acid
- id: NCBITaxon:4097
label: tobacco
- id: AUTO:NtPat1
label: NtPat1
- id: AUTO:NtPat2
label: NtPat2
- id: AUTO:NtPat3
label: NtPat3
- id: PR:000012798
label: phospholipase A2 (PLA2)
- id: AUTO:patatin
label: patatin
- id: CHEBI:15560
label: 12-oxophytodienoic acid
- id: NCBITaxon:12242
label: tobacco mosaic virus
- id: AUTO:NtPat
label: NtPat
- id: AUTO:virus-infected%20leaves
label: virus-infected leaves
- id: AUTO:N/A
label: N/A
- id: CHEBI:61121
label: oxylipins
- id: CHEBI:60956
label: colneleic acid
- id: CHEBI:60959
label: colnelenic acid
- id: NCBITaxon:4787
label: Phytophthora infestans
- id: AUTO:NPR1
label: NPR1
- id: AUTO:PR-1
label: PR-1
- id: PR:000011377
label: NPR1
- id: CHEBI:35962
label: salicylic acid (SA)
- id: NCBITaxon:3702
label: Arabidopsis thaliana
- id: AUTO:Pseudomonas%20rhizobacteria
label: Pseudomonas rhizobacteria
- id: NCBITaxon:323
label: Pseudomonas syringae pv. tomato (Pst)
---
input_text: 'The phytoalexin-deficient Arabidopsis mutant pad3-1, which is affected
in the production of the indole-type phytoalexin camalexin, has previously been
shown not to display altered susceptibility to either the bacterium Pseudomonas
syringae (Glazebrook & Ausubel 1994; Proc. Natl. Acad. Sci. USA, 91: 8955-8959)
or the biotrophic fungi Peronospora parasitica (Glazebrook et al. 1997; Genetics,
146: 381-392) and Erysiphe orontii (Reuber et al. 1998; Plant J. 16: 473-485). We
now show that this mutant is markedly more susceptible than its wild-type parental
line to infection by the necrotrophic fungus Alternaria brassicicola, but not to
Botrytis cinerea. A strong camalexin response was elicited in wild-type plants inoculated
with either Alternaria brassicicola or Botrytis cinerea, whereas no camalexin could
be detected in pad3-1 challenged with these fungi. Hence, PAD3 appears to be a key
determinant in resistance to at least A. brassicicola. The induction of salicylate-dependent
and jasmonate/ethylene-dependent defense genes was not reduced in Alternaria-challenged
pad3-1 plants compared to similarly treated wild-type plants. Camalexin production
could not be triggered by exogenous application of either salicylate, ethylene or
jasmonate and was not, or not strongly, reduced in mutants with defects in perception
of these defense-related signal molecules. Camalexin-production appears to be controlled
by a pathway that exhibits little cross-talk with salicylate-, ethylene- and jasmonate-dependent
signalling events.'
raw_completion_output: |-
genes: pad3-1; Pseudomonas syringae; Peronospora parasitica; Erysiphe orontii; Alternaria brassicicola; Botrytis cinerea
proteins: PAD3
molecules: camalexin; salicylate; ethylene; jasmonate
organisms: Arabidopsis; Pseudomonas syringae; Peronospora parasitica; Erysiphe orontii; Alternaria brassicicola; Botrytis cinerea
gene_gene_interactions:
gene_protein_interactions:
gene_organism_relationships:
protein_protein_interactions:
protein_organism_relationships:
gene_molecule_interactions:
protein_molecule_interactions:
label: phytoalexin-deficient Arabidopsis mutant
prompt: |+
From the text below, extract the following entities in the following format:
genes: <A semicolon-separated list of genes.>
proteins: <A semicolon-separated list of proteins.>
molecules: <A semicolon-separated list of molecules.>
organisms: <A semicolon-separated list of taxonomic terms of living things.>
gene_gene_interactions: <A semicolon-separated list of gene-gene interactions.>
gene_protein_interactions: <A semicolon-separated list of gene-protein interactions.>
gene_organism_relationships: <A semicolon-separated list of gene-organism relationships.>
protein_protein_interactions: <A semicolon-separated list of protein-protein interactions.>
protein_organism_relationships: <A semicolon-separated list of protein-organism relationships.>
gene_molecule_interactions: <A semicolon-separated list of gene-molecule interactions.>
protein_molecule_interactions: <A semicolon-separated list of protein-molecule interactions.>
label: <The label (name) of the named thing>
Text:
The phytoalexin-deficient Arabidopsis mutant pad3-1, which is affected in the production of the indole-type phytoalexin camalexin, has previously been shown not to display altered susceptibility to either the bacterium Pseudomonas syringae (Glazebrook & Ausubel 1994; Proc. Natl. Acad. Sci. USA, 91: 8955-8959) or the biotrophic fungi Peronospora parasitica (Glazebrook et al. 1997; Genetics, 146: 381-392) and Erysiphe orontii (Reuber et al. 1998; Plant J. 16: 473-485). We now show that this mutant is markedly more susceptible than its wild-type parental line to infection by the necrotrophic fungus Alternaria brassicicola, but not to Botrytis cinerea. A strong camalexin response was elicited in wild-type plants inoculated with either Alternaria brassicicola or Botrytis cinerea, whereas no camalexin could be detected in pad3-1 challenged with these fungi. Hence, PAD3 appears to be a key determinant in resistance to at least A. brassicicola. The induction of salicylate-dependent and jasmonate/ethylene-dependent defense genes was not reduced in Alternaria-challenged pad3-1 plants compared to similarly treated wild-type plants. Camalexin production could not be triggered by exogenous application of either salicylate, ethylene or jasmonate and was not, or not strongly, reduced in mutants with defects in perception of these defense-related signal molecules. Camalexin-production appears to be controlled by a pathway that exhibits little cross-talk with salicylate-, ethylene- and jasmonate-dependent signalling events.
===
extracted_object:
id: 57280cfa-7fba-4016-9e6a-8682b107702d
label: phytoalexin-deficient Arabidopsis mutant
genes:
- AUTO:pad3-1
- AUTO:Pseudomonas%20syringae
- AUTO:Peronospora%20parasitica
- AUTO:Erysiphe%20orontii
- AUTO:Alternaria%20brassicicola
- AUTO:Botrytis%20cinerea
proteins:
- PR:000012221
molecules:
- CHEBI:22990
- CHEBI:30762
- CHEBI:18153
- CHEBI:58431
organisms:
- NCBITaxon:3701
- NCBITaxon:317
- NCBITaxon:123356
- NCBITaxon:62715
- NCBITaxon:29001
- NCBITaxon:40559
named_entities:
- id: GO:0004707
label: MAPK
- id: AUTO:SIPK
label: SIPK
- id: AUTO:WIPK
label: WIPK
- id: AUTO:NahG
label: NahG
- id: AUTO:salicylic%20acid-induced%20protein%20kinase
label: salicylic acid-induced protein kinase
- id: AUTO:wounding-induced%20protein%20kinase
label: wounding-induced protein kinase
- id: CHEBI:16480
label: nitric oxide
- id: CHEBI:16914
label: salicylic acid
- id: CHEBI:18153
label: ethylene
- id: CHEBI:18292
label: jasmonic acid
- id: NCBITaxon:4097
label: tobacco
- id: AUTO:NtPat1
label: NtPat1
- id: AUTO:NtPat2
label: NtPat2
- id: AUTO:NtPat3
label: NtPat3
- id: PR:000012798
label: phospholipase A2 (PLA2)
- id: AUTO:patatin
label: patatin
- id: CHEBI:15560
label: 12-oxophytodienoic acid
- id: NCBITaxon:12242
label: tobacco mosaic virus
- id: AUTO:NtPat
label: NtPat
- id: AUTO:virus-infected%20leaves
label: virus-infected leaves
- id: AUTO:N/A
label: N/A
- id: CHEBI:61121
label: oxylipins
- id: CHEBI:60956
label: colneleic acid
- id: CHEBI:60959
label: colnelenic acid
- id: NCBITaxon:4787
label: Phytophthora infestans
- id: AUTO:NPR1
label: NPR1
- id: AUTO:PR-1
label: PR-1
- id: PR:000011377
label: NPR1
- id: CHEBI:35962
label: salicylic acid (SA)
- id: NCBITaxon:3702
label: Arabidopsis thaliana
- id: AUTO:Pseudomonas%20rhizobacteria
label: Pseudomonas rhizobacteria
- id: NCBITaxon:323
label: Pseudomonas syringae pv. tomato (Pst)
- id: AUTO:pad3-1
label: pad3-1
- id: AUTO:Pseudomonas%20syringae
label: Pseudomonas syringae
- id: AUTO:Peronospora%20parasitica
label: Peronospora parasitica
- id: AUTO:Erysiphe%20orontii
label: Erysiphe orontii
- id: AUTO:Alternaria%20brassicicola
label: Alternaria brassicicola
- id: AUTO:Botrytis%20cinerea
label: Botrytis cinerea
- id: PR:000012221
label: PAD3
- id: CHEBI:22990
label: camalexin
- id: CHEBI:30762
label: salicylate
- id: CHEBI:58431
label: jasmonate
- id: NCBITaxon:3701
label: Arabidopsis
- id: NCBITaxon:317
label: Pseudomonas syringae
- id: NCBITaxon:123356
label: Peronospora parasitica
- id: NCBITaxon:62715
label: Erysiphe orontii
- id: NCBITaxon:29001
label: Alternaria brassicicola
- id: NCBITaxon:40559
label: Botrytis cinerea
Thanks for pointing this out @serenalotreck - should be a quick fix.
@caufieldjh wondering if this has been fixed? I'm running OntoGPT on a large quantity of documents and the ballooining size of the YAML is severely slowing down my ability to parse it into KGX format -- it takes several hours just to read in the YAML file.
Not related, but the slowness of the import is causing problems -- it seems like ChatGPT is putting in non-allowed unicode characters in random places, which breaks YAML safe_load, but it takes several hours for me to locate each one via trying to read it and having it break again. I'm currently working on trying to find them all preemptively and remove them before trying to read in the YAML file, but it seems like something that shouldn't be happening in the first place. I haven't tried making a small reproducible example (am under a deadline), so I won't open a new issue yet, but wondered if you'd experienced anything similar.
Hi @serenalotreck - going to attempt a fix for this today.
I haven't explicitly seen any issues with GPT emitting weird unicode characters, but it seems inevitable to happen among any sufficiently large collection of extractions, and we've seen something potentially related when extracting from many PubMed entries. I'm going to consider this issue related to #323 as there should be preprocessing to handle it.
OK, please try pulling the most recent repo version and let me know if you're still seeing redundant named entities.
Looks like that fixed it, thanks!