melt
melt copied to clipboard
Wrong performance.csv
Describe the bug
When using the evaluation-client to run logmap-bio with the DH tracks, I get a wrong performance.csv for the "tadirah-unseco" test case. It shows me 0 TP, but the "true" number is higher when looking into the systemAlignment.rdf and compare it to the reference.rdf of the test case. I think it would be important to find out if this is an issue on the matcher's side (unproblematic) or an issue in MELT. The latter would be more problematic since the evaluation is highly based on the numbers of the performane.csv files.
To Reproduce
Steps to reproduce the behavior:
-
Version of MELT: evaluation client from documentation
-
Java version:
openjdk version "1.8.0_422"
-
Python version:
3.8.18
-
Operating system:
macOS 14.4
-
Run
java -jar matching-eval-client-latest.jar --systems ../Matcher/DockerMatcher/logmap-bio-melt-oaei-2021-web-latest.tar.gz --track http://oaei.webdatacommons.org/tdrs/ dh 2024all --results oaei2024_logmapbio_oaeidh_(date +"%Y-%m-%d_%H-%M-%S")
performance.csv
:
Type,Precision (P),Recall (R),Residual Recall (R+),F1,"# of TP","# of FP","# of FN","# of Correspondences",Time,Time (HH:MM:SS)
ALL,0.0,0.0,0.0,0.0,0,0,15,0,3763809750,00:00:03
CLASSES,0.0,0.0,0.0,0.0,0,0,0,0,-,-
PROPERTIES,0.0,0.0,0.0,0.0,0,0,0,0,-,-
INSTANCES,0.0,0.0,0.0,0.0,0,0,15,0,-,-
systemAlignment.rdf: (removed unneeded alignments for readability)
<?xml version="1.0" encoding="utf-8"?>
<rdf:RDF xmlns="http://knowledgeweb.semanticweb.org/heterogeneity/alignment"
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:xsd="http://www.w3.org/2001/XMLSchema#">
<Alignment>
<xml>yes</xml>
<level>0</level>
<type>??</type>
<onto1>https://vocabs.dariah.eu/tadirah/</onto1>
<onto2>http://logmap-tests/oaei/target.owl</onto2>
<uri1>https://vocabs.dariah.eu/tadirah/</uri1>
<uri2>http://logmap-tests/oaei/target.owl</uri2>
<map>
<Cell>
<entity1 rdf:resource="https://vocabs.dariah.eu/tadirah/cataloging"/>
<entity2 rdf:resource="http://vocabularies.unesco.org/thesaurus/concept3799"/>
<measure rdf:datatype="xsd:float">1.0</measure>
<relation>=</relation>
</Cell>
</map>
...
</Alignment>
</rdf:RDF>
[reference.rdf](https://github.com/FelixFrizzy/DH-benchmark/blob/main/dhcs2_tadirah-unesco/reference.rdf) of tadriah-unesco test case (removed unneeded alignments for readability)
<?xml version="1.0" encoding="utf-8"?>
<rdf:RDF xmlns="http://knowledgeweb.semanticweb.org/heterogeneity/alignment"
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:xsd="http://www.w3.org/2001/XMLSchema#">
<Alignment>
<xml>yes</xml>
<level>0</level>
<type>**</type>
<onto1>https://vocabs.dariah.eu/tadirah/</onto1>
<onto2>http://vocabularies.unesco.org/thesaurus/</onto2>
<uri1>https://vocabs.dariah.eu/tadirah/</uri1>
<uri2>http://vocabularies.unesco.org/thesaurus/</uri2>
<map>
<Cell>
<entity1 rdf:resource="https://vocabs.dariah.eu/tadirah/cataloging"/>
<entity2 rdf:resource="http://vocabularies.unesco.org/thesaurus/concept3799"/>
<measure rdf:datatype="xsd:float">1.0</measure>
<relation>=</relation>
</Cell>
</map>
...
</Alignment>
</rdf:RDF>
The correctly identified alignment is not reflected in the perfomance.csv (along with all the other TP's)
Full log output
Expected behavior
The performance.csv
should list 10 TP and 5 FP instead of 0TP and 15FP.