boomer icon indicating copy to clipboard operation
boomer copied to clipboard

Merging 14 Ontologies (huge merge)

Open OliverHex opened this issue 5 months ago • 6 comments

Hello,

I am trying to merge 14 ontologies at once with Boomer : DERMO, DO, HUGO, ICDO, IDO, IEDB, MESH, MFOMD, MPATH, NCIT, OBI, OGMS, ORPHANET and SCDO.

This is how I proceed :

  • I compute the 91 LOGMAP alignments between every pair of ontologies (i.e. 91 = n(n-1)/2 with n=14)
  • I convert and merge these alignments into a single ptable (Boomer format)
  • I join all these ontologies into a single "union" OWL file (622K classes ~ 2.5 GB)
  • I launch Boomer on the union OWL file and the single ptable (54K entries ~ 7 MB).

I have run various tests and it seems that when the ptable is too large, the problem becomes intractable.

By removing the MESH and NCIT (i.e. now I try to merge 12 ontologies), the resulting union ontology is only 81K classes (242 MB) and the ptable contains only 7K entries. In this case, Boomer ends with a result in 30 min (on a i7 - 1.90 GHz with 32 GB RAM​).

But I also need the MESH and the NCIT ontologies to be included in my merge result.

Overall, I am wondering if that's the correct way to proceed ?

Here follow some questions :

  1. Should I continue with this strategy ? -> Should I keep trying to merge all at once ? In order to give Boomer complete decision power on selecting the best mappings (without introducing any bias)...

  2. Or should I change my merging strategy ? -> Should I split the problem into smaller sub-problems -> Then organize them in some order (according to some criteria) : this could introduce some bias... -> And launch Boomer following this order.

    For example, I could try this : - I convert the 91 alignments into 91 ptables (instead of converting and merging them into 1 single ptable) - For each of the 91 ptables ----> I launch Boomer with this ptable and the union OWL file. ----> In the union OWL file, I add all the equivalence axioms generated by Boomer for this ptable.

    So far, it seems to work much faster. But the problem is the arbitrary order in the for-loop that is introducing a bias : since each equivalence axiom added at one step will influence Boomer results in the next steps.

Any suggestions ?

Oliver

PS : I couldn't attach the Boomer input union ontology (compressed ~ 140 MB) since the maximum attachment size is 25 MB. However, the input ptable is here ptable-91-mappings.zip .

OliverHex avatar Feb 05 '24 12:02 OliverHex