ColabFold icon indicating copy to clipboard operation
ColabFold copied to clipboard

Spire DB Download?

Open brianloyal opened this issue 11 months ago • 16 comments

PR #614 added reference to the SPIRE paired metagenomic databases (spire_ctg10_2401_db). Is this available for public download? Best I can find is the mOTU reference data at http://spire.embl.de/downloads. Any relation?

brianloyal avatar Feb 06 '25 21:02 brianloyal

It’s a database based on spire to improve paired MSA generation. We will upload it soon. @endixk we should discuss this next week.

martin-steinegger avatar Feb 07 '25 11:02 martin-steinegger

@martin-steinegger it will be great to hear any updates on this! Thanks

punit-jha123 avatar Apr 10 '25 22:04 punit-jha123

Hi @martin-steinegger any updates on the availability of the SPIRE database?

yab-fsp avatar Jul 03 '25 20:07 yab-fsp

Hi @martin-steinegger I also would like to get this spire_ctg10_2401_db database. Without it local colabfold_search and local colabfold msa servers can't use env pairing which is necessary to replicate the behavior of your Colabfold MSA server. Could you put it on the Colabfold Databases web page (https://colabfold.mmseqs.com/)? Thanks!

tomgoddard avatar Aug 23 '25 01:08 tomgoddard

We tested this database during CASP but never released it (we should!). The search should just work fine without it. @endixk could you please upload the database?

martin-steinegger avatar Aug 23 '25 04:08 martin-steinegger

Looks to be available at the new download site: https://opendata.mmseqs.org/colabfold

tlitfin-unsw avatar Aug 23 '25 05:08 tlitfin-unsw

Ah great!

martin-steinegger avatar Aug 23 '25 06:08 martin-steinegger

Thanks!

tomgoddard avatar Aug 25 '25 19:08 tomgoddard

Hi, have any of you gotten the spire_db to work locally with colabfold_search? Running into issues generating multimer alignments with the download + untar hosted on the download site

KPHippe avatar Sep 05 '25 16:09 KPHippe

I downloaded the spire database but didn't have enough disk space to try it. But there were other reasons I gave up on spire. I was interested in getting the spire database because Boltz structure prediction uses the ColabFold MSA server requesting pairing of sequences using the env database which the MSA server code indicates uses spire. But Milot Mirdita told me that in fact the ColabFold MSA server is not using spire "The env pairing is very surprising since we don't actually offer this database in our servers and it will just crash if you request an env-paired MSA. As Martin mentioned in the GH issue, this was just an experiment for CASP that didn't go particularly well (possibly due to some bugs, we never dug too deep)." Since Milot suggests the pairing using spire didn't give good results and is not offered by the ColabFold MSA server I did not try it on my local machine.

tomgoddard avatar Sep 05 '25 19:09 tomgoddard

I think the hosted database is missing the mapping file neccessary for pairing. I generated an ad hoc mapping file using sample-level ids (also creating dummy dmp files) which is pairing things correctly for my test cases (+ve controls in a known operon - ie neighbours on the same contig). I suspect mapping at the level of spire MAGs is probably the intended strategy in general. Looks possible to reproduce from public spire data but I haven't done it yet.

tlitfin-unsw avatar Sep 07 '25 12:09 tlitfin-unsw

@tlitfin-unsw we did generate a tax. lookup that uses each contig not the MAG. MAGs are a bit unsafe IMO.

martin-steinegger avatar Sep 07 '25 14:09 martin-steinegger

OK, thanks Martin, that is good to know! Reproducing mapping based on contig ID is much easier 😄. My intution was that in the case of low paired depth there is potential upside to more sensitive MAG based pairing and any mis-pairing probably leaves you no worse off (ie unlikely to get spurious signal by chance).

tlitfin-unsw avatar Sep 07 '25 14:09 tlitfin-unsw

Thanks all for the insight here, this is very helpful. @martin-steinegger or @tlitfin-unsw would it be possible to point to the tax db/mappings that you all used?

KPHippe avatar Sep 08 '25 14:09 KPHippe

Better to use the official file from @martin-steinegger if it is available but I sent you the file I have been using via FileSender. I assigned each contig an arbitrary unique taxon id.

tlitfin-unsw avatar Sep 10 '25 06:09 tlitfin-unsw

Thank you!

KPHippe avatar Sep 10 '25 14:09 KPHippe