foldseek icon indicating copy to clipboard operation
foldseek copied to clipboard

MGnify database hits not found on MGnify website

Open twaksman001 opened this issue 2 years ago • 7 comments

Using the FoldSeek website, many of the hits with highest bit score are from the MGnify database, but using either the identifier (MGYP + digits) or protein sequence I am not able to find the protein when searching on MGnify website.

twaksman001 avatar Jan 12 '23 15:01 twaksman001

Could you please post a link to a search result?

martin-steinegger avatar Jan 12 '23 16:01 martin-steinegger

Is the file I uploaded appropriate?

twaksman001 avatar Jan 23 '23 12:01 twaksman001

I tried to access the top three hits from the ESMatlas and could access all of them. (1) https://esmatlas.com/explore/detail/MGYP001476177674 (2) https://esmatlas.com/explore/detail/MGYP001575997564 (3) https://esmatlas.com/explore/detail/MGYP002782136678 What ID does not work?

martin-steinegger avatar Jan 26 '23 15:01 martin-steinegger

@twaksman001 The ESM2 models are not integrated in the MGnify database or easily searchable in the EBI site, like it is the case of AlphaFold2 that is integrated with the EBI seq dbs.

If you want to download the model either access it like the previous comment by @martin-steinegger or using the ESM2 Atlas API (see: https://esmatlas.com/about#api)

as in:

aria2c https://api.esmatlas.com/fetchPredictedStructure/MGYP001476177674

Rolando-at-Arzeda avatar Feb 19 '23 01:02 Rolando-at-Arzeda

@yeojingi you might be able to help here.

martin-steinegger avatar Apr 26 '23 16:04 martin-steinegger

The MGYP id is invented in the latest MGnify paper. But it seems that they are not supporting the interactive exploration in the website of MGYPs. But you can access the ftp storage and download annotations - Pfam, biomes. For the case of information of predicted structures - pLDDT, pTM, model versions, the ESMfold document is providing the metadata for it.

yeojingi avatar Apr 27 '23 04:04 yeojingi