Spire DB Download?
PR #614 added reference to the SPIRE paired metagenomic databases (spire_ctg10_2401_db). Is this available for public download? Best I can find is the mOTU reference data at http://spire.embl.de/downloads. Any relation?
It’s a database based on spire to improve paired MSA generation. We will upload it soon. @endixk we should discuss this next week.
@martin-steinegger it will be great to hear any updates on this! Thanks
Hi @martin-steinegger any updates on the availability of the SPIRE database?
Hi @martin-steinegger I also would like to get this spire_ctg10_2401_db database. Without it local colabfold_search and local colabfold msa servers can't use env pairing which is necessary to replicate the behavior of your Colabfold MSA server. Could you put it on the Colabfold Databases web page (https://colabfold.mmseqs.com/)? Thanks!
We tested this database during CASP but never released it (we should!). The search should just work fine without it. @endixk could you please upload the database?
Looks to be available at the new download site: https://opendata.mmseqs.org/colabfold
Ah great!
Thanks!
Hi, have any of you gotten the spire_db to work locally with colabfold_search? Running into issues generating multimer alignments with the download + untar hosted on the download site
I downloaded the spire database but didn't have enough disk space to try it. But there were other reasons I gave up on spire. I was interested in getting the spire database because Boltz structure prediction uses the ColabFold MSA server requesting pairing of sequences using the env database which the MSA server code indicates uses spire. But Milot Mirdita told me that in fact the ColabFold MSA server is not using spire "The env pairing is very surprising since we don't actually offer this database in our servers and it will just crash if you request an env-paired MSA. As Martin mentioned in the GH issue, this was just an experiment for CASP that didn't go particularly well (possibly due to some bugs, we never dug too deep)." Since Milot suggests the pairing using spire didn't give good results and is not offered by the ColabFold MSA server I did not try it on my local machine.
I think the hosted database is missing the mapping file neccessary for pairing. I generated an ad hoc mapping file using sample-level ids (also creating dummy dmp files) which is pairing things correctly for my test cases (+ve controls in a known operon - ie neighbours on the same contig). I suspect mapping at the level of spire MAGs is probably the intended strategy in general. Looks possible to reproduce from public spire data but I haven't done it yet.
@tlitfin-unsw we did generate a tax. lookup that uses each contig not the MAG. MAGs are a bit unsafe IMO.
OK, thanks Martin, that is good to know! Reproducing mapping based on contig ID is much easier 😄. My intution was that in the case of low paired depth there is potential upside to more sensitive MAG based pairing and any mis-pairing probably leaves you no worse off (ie unlikely to get spurious signal by chance).
Thanks all for the insight here, this is very helpful. @martin-steinegger or @tlitfin-unsw would it be possible to point to the tax db/mappings that you all used?
Better to use the official file from @martin-steinegger if it is available but I sent you the file I have been using via FileSender. I assigned each contig an arbitrary unique taxon id.
Thank you!