ColabFold icon indicating copy to clipboard operation
ColabFold copied to clipboard

Questions about using the MSA part alone

Open zxzhang8 opened this issue 3 years ago • 0 comments

Thank you for sharing the ColabFold repo, it helps a lot. Are there any relevant APIs or methods that support the extraction of msa from protein sequences alone without structure prediction? To use the MSA api, I ran the run_mmseqs2() alone, and got :

>101
MMTRRKKRSCSNQKKEEEKSERIPFDLVIEILLRLPVKSIARFRYVSKLWQSTLRGQHFTESYLTISSSRPKILFTCLKDCETFFFSSPHPQDLSPIAANLHMSFPISCPSNICRPVRGWLCGLHQRTTKGTTVTEPLICNPSTGESVVLRKVKTRRKGVISFLGFDPIDKNFKVLCMTRSCIGRADSEEHQVHTLETGKKPSRKMIECDILHYPVPVEHTNGFSQYDGVCINGVLYYLAIVHGVSDDRYPDVVCFEFGSDKFKYIKKVAGHDMEILYLGRRLNSILVNYKGKLAKLQPNMPNNVCTGIQLWVLEDAEKHEWSSHIYVLPPPWRNVYEETKLCFVGTTRKGEIVLSPNTISDFFYLLYYNPDRNTITIVKIKGMETFQSHKAYTFLDHLEDVNLVPIWRM
>A0A654FJR0	459	0.992	2.597E-140	0	409	410	0	409	410
MMTRRKKRSCSNQKKEEEKSERIPFDLVIEILLRLPVKSIARFRYVSKLWQSTLRGQHFTESYLTISSSRPKILFTCLKDCETFFFSSPHPQDLSPIAANLHMSFPMSCPSNICRPVRGWVCGLHQRTTKGTTVTEPLICNPSTGESVVLRKVKTRRKGVISFLGFDPIDKNFKVLCMTRSCIGRADSEEHQVHTLETGKKPSRKMIECDILHYPVPVEHTNGFSQYDGVCINGVLYYLAIVHGVSDDRYPDVVCFEFGSDKFKYIKKVAGHDMEILYLGRRLNSILVNYKGKLAKLQPNMPNNVCTGIQLWVLEDAEKHEWSSHIYVLPPPWRNVYEETKLCFVGTTRKGEIVLSPNTISDFFYLLYYNPDRNTITIVKIKGMETFQIHKAYTFLDHLEDVNLVPIWRM
>UPI000A29B610	417	0.881	1.390E-125	0	403	410	0	404	414
MMTRRKKRSCSNPKKEEvVKSEPIPFDLVIEILLRLPAKSIARFRYVSKLWQSTLRGPHFTESFLTLSSSRPKILFTCLKDGETFFFSSPHTQDLSPISANIHMSFPVNCPSNICRPVHGWVCGSHQRTTKGTTVTVPLICNPSTGESLALCKVKTRRKGVISFLGFDPIDKKFKVLCMTRAYVGRADSEEHQVLTLETGKKPSRKMIECDILHYPTVVEHTNGFSQYDGVCINGVLYYLAIVHGVSDHRYPDVVCFEFRSDKFNYIKKVAGPGME-MYLRGQLDSTLVNYKGKLAKLQPNMSNNgVCTGIQLWVLEDAEKHEWSSHIYVLPPPWRNVYEETKLCFVGTTRKGEIVLSPNTISNFFYLLYYNPERNTITIVKIKGLETFKSHKAYTFVDHLEDVKL------
>D7LS85	414	0.881	1.251E-124	1	403	410	0	403	714
-MTRRKKRSCSNPKKEEvVKSEPIPFDLVIEILLRLPAKSIARFRYVSKLWQSTLRGPHFTESFLTLSSSRPKILFTCLKDGETFFFSSPHTQDLSPISANIHMSFPVNCPSNICRPVHGWVCGSHQRTTKGTTVTVPLICNPSTGESLALCKVKTRRKGVISFLGFDPIDKKFKVLCMTRAYVGRADSEEHQVLTLETGKKPSRKMIECDILHYPTVVEHTNGFSQYDGVCINGVLYYLAIVHGVSDHRYPDVVCFEFRSDKFNYIKKVAGPGME-MYLRGQLDSTLVNYKGKLAKLQPNMSNNgVCTGIQLWVLEDAEKHEWSSHIYVLPPPWRNVYEETKLCFVGTTRKGEIVLSPNTISNFFYLLYYNPERNTITIVKIKGLETFKSHKAYTFVDHLEDVKL------
>UPI00053981B0	361	0.708	2.221E-106	0	405	410	0	407	413
MMTRRKTRSCSNPRKEEvVKPEPIPFDLVIEVLLRLPVRSVARCRSVSKLWNSTLEGPHFTELFFTLSSSRPKILFTCLKGDETVFFsSSRNPQDLS-IDANIRMSFPINSSSHICRPVRGWLCGLH-RTTKGATVTVPLICNPSTGESVPLPTVKTRRKVVISFFGYDPMEKTFKALCMTRSSVGGEDTPsgEHQVLTLGTGKTsSSREMIDCDILHHPAVVEETNGFCQYDAICINGVLYYLAVVHDVFDG-HPDIICFEFESKKFSYIKK-ADHSMGMYSGGYGLESTLVNYKGKLTQLQPNYSnDRICNGIQLLVLEDAAKHRWSTYIYVLPPPWRNMYRDTKLCFVGTTSRGEIVLSPNTISRFFYLLYYSPERNTIQIVKIKGLETFKGHKAYIFLDHVEDVKLVP----
>UPI00053A3292	360	0.698	5.686E-106	0	405	410	0	409	415
MMTRRKTRSCSKPRNEEEvvKPEPIPLDLVIEVLLRLPVRSVARCRSVSKLWNSTLEDPHFTESFFTLSSSRPKILFTCLKGDETVFFsSSPNLQDLS-TYANIRMSFPINSSSHICRPVRGWVCGLH-RTTKGATVTVPLICNPSTGESVALPTVKTRRKVVICFFGYDPIEKTFKALCMTRSSLGGEDTPsgEHQVLTLGTGKTsSSREMIDCDILHHPAVVEETNGFCQYDAICINGVLYYLAVVHDVFDG-TPDIVCFEFESQKFSYIKK-ADHGMGMYSGGYGLESTLVNYKGKLAKLQPNYSnvDRIYDGIQLLVLEDAAKHQWSSYIYVLPPPWRNIYKDDTLCFVGTTSKGEIVMSPNTISGFFYLVYYSPERNTIQIVKIKGLETFKGHKAYTFLDHVEDVKLVP----
>UPI000CD4DC39	359	0.691	1.455E-105	0	403	410	0	405	413
MMTRRKTRSCSNPRNEEVKPEPIPFDLVIEILLRSPVKSIARFRKVSKLWESTLRGPQFTESFFTLSWSRPKILFTCLKDGETVFFSLpqPHPQDPSIITANIHMSFPINCSSHICRPVRGLVCGLHRRKTKGATSTVPLICNPSTGESFPLHKVNTRRKAVISFFGYNPIDKSFKVLSMTRSSGGLSHSGEHQVLTFKTGTKgSSRKMIECDILHHPSVVEQTNGFCQYDGICINGALYYLAVVYAVS-NGYPDVVRFDLESEKFSYIK--RADHVVETYSGGHLEPTLVNYKGRLGKLHPSYSnDRACTGIQLLVLEDAGKHQWSSYIYVLPPPWMNIYDvKTKFCFVGTTVEGDIVLSPNTISDFFYLLFYSPERNTINIVGIKGMESFKGHKAYAFLDHVEDVKL------
>UPI000CED1DFB	359	0.656	1.455E-105	0	403	410	0	396	427
MMPRRKTRSF------LAKIEPIPFDLVIEILLRLPVKSIATFRRVSKLWASTLRDPSFTESYLTISSSRKKLLFTCLKDDETCFFsSSPNSQSPSSdISAKVHMSFPINCPTNICRPVRGLVCGLNQrRPSKGRTVTVPLICNPSTGQSLALPDVRTRGKRVISCFGYDPIDKQFKVLCMTLPYVGXSSSQDHQVLTLGTQKKPSWKMIKCEVPHIPVDFEHTNG-----GVCINGVLYYLAILLHVdaYTDGYFDIVSFDIRSEKFSYIKT--AVTGMRIHXGEKLESTLVNYKGKLAKLQRNIDDYgTYTGIQLWVLEDAEKHEWSSYIYVLPPPWKNIFEETTLCFVGTTSKGEIVLSPNTISDSFYLLYHNPETKTITKVGVQGMEAYKGHKAYTFLDHVEDVTL------
>A0A6D2IA77	357	0.629	6.970E-105	1	404	410	0	406	410
-MTRRRKAR-SLPVV---VFEQIPFDLVIEILLRSPVKSIGRFRSVSKLWESTIRSPDFKESFRAISSSRNNLLFTCLKDGETYFFSSPrpqvHPQKLpSPIAANVHMSFPINCPTGVCRPVRGLVCGLRQQTSKEGTVTVPLICNPSTGESLALPKVRTTKKGVMSCFGYDPIDKQFKVLCMTLSSEGGlPNSAEHQLLTLeEAKEKHSWKMIECYVRHYPYFAEHTNGFYLHDGICINGVLYYVAIVFHDEFDGYPDIACFDIRSEKFSYIK--KADEGMNVNVGEKLESTLVNYKGKLAKLQPNIGNNneGYNGIQLWVLEDAEKHEWSRHIYVFPLHRKSIFEKTRLCFVGTTSTGEIVLSPNTISDSFYLIYYNPERNTLKRVEVRGMEAYKSCKAYTFLDHVEDVTLL-----
>UPI00053B3BD0	353	0.694	1.168E-103	0	405	410	0	408	414
MMTRRKTRSCSKPRNEEvVKPEPIPFDLVIEVLLRLPVRSVARCRSVSKLWNSTLEGPHFTESFFTLSSFRPKILFTCLKGDETVFFsSSPNPQDLS-IAANIRMSFPINSSSHICRPVRGWLCGLH-RTTKGATVTVPLVCNPSTGESVPLPTLKTRRKVVITFFGYDPIVKTFKALCMTRSSVGGEDSPsgEHQVLTLGTGKTsSSREMIDCHILHHPAVVEETNGFCQYDAICINGVLYYLAVVHDVFEG-HPDIVCFEFESKKFSYIKK-ADHSMGMYSGGYGLEPTLVNYKGKLAKLQPSYSnvDRICNGIQLLVLEDAVKHQWSSYIYILPPPWMNLYGDTKLCFVGTTSRGEIVLSSNTISRFFYLLYYSPERNTIQIVKVKGLETFKGHKAYIFLDHVEDVKLVP----
>A0A654EDH0	340	0.591	4.906E-99	0	401	410	0	402	412
MKTRRNTRSCSNsSKREEKNSETIPFDLVIEILTRLPVKSIARFRCLSKLCASTLNNPDFTESFFTISSSRPKLLFTCPKDGETFFFSSPKPRDSSPLVVNFHMSFSINHLCGICRPVCGFIYGFNSHtNLKGRTISKPLICNPSTGESWPLPRVKTNRTIITSFFGYDPINKEFKVLCMTKSKFG--VFGEHQVLTFGTGKELSWRKIKCDMAHYPEVVDYeASGYPRplYDGICINGVLYYLGRVHDDLDG-FPDMVCFDIKFEKFSYIKKANGMKRN---SGVNLQPTLVNHKGKIAKLQANIGPGsiRYTGIQLWVLEDAEKHQWSSYIYVVPPPWKNIIEETKLRFVGTSDTGDIVLSPCNISNSFYLLYYNPERNAIARVEIQGMEAFKTHKSYAFLDYAENI--------
>UPI000CD4B51D	334	0.548	3.923E-97	0	406	410	0	409	412
MKTRRNTRSCSNSRNRAEKTSetiHLPFDLVIEIFMRLPAKSVARFHCLSKLCASTLSNPNFTDAFFIRSSSRPKLLFNCPKDGETFFFSSPKPrDDSSPLAVSFHKSFPINRPFDICRPVSGFVYgFNYHKTSTGRTVSVPLICNPSTGQSWTLPSVKTNRTIITSYFGYDPIDKEFKVLCMTQSYLGEF--GEQKVLTLGTGKKLSWRKIKCDMQHFPCPVEGEPnhHYPLYDGICIDGVLYYLGMVRGDADG-FPDIICFDIKSEKFSYVK--KTHGMER-NSGSVLEQTLVNYKGKIAKFQPkfDEHGTILTGIQLYVLEDAEKHQWSSYIYVMPPPWKSIVEETKLRFVGTSDTGEIVLSPYNISDSSYLLSYDPERNTLTKVGIKGMEALKPHKSYAFLDHVENVVKIEP---
>A0A087GJI2	325	0.479	5.230E-94	22	403	410	27	395	403
----------------------LPIDLIIEILSRLPAKSIARCRCVSKLWGSIIRSQVFTELVLTRSATtQPHLLFACEKNGEVFFYSSPQNpyEKSSPITANYHMKFPFDDDDFVLRPVHGLICLKQIRIFKGRNTTALMICNPSTGQSLTLPRVKTRRVDVMSFLGYDPVGKQFKLLSMTSSISGSnRVSAEHQILTLGNGKL-SWRKIECSTPHYPL----------SRGICINGVLYYPAEDKCI--EGKFRIACFDIRSEKFKLIKRVDEV----------VRGKLVNYKGKLATLRTDtSPFSICRrsrSFELCVLEDAEKHEWSTHTYVLPPLSTDLVSSSGMFFQGVTRRGEIVLSPpsYYPSDPFYLLYYNLERNTFVKVEIQGIHMHVRHKVYTFVDHVENVKL------
>UPI000901B5A9	325	0.558	5.230E-94	3	402	410	15	415	416
---RRRKRKGEERERRERERERVPFDLVIEILTRLPAKSVARFRCLSKVCASTLSNPVFIDSFSTISSSRPKLLFTCPKDGKTFFFSSPKPRDSSPLAVDFQTSFPINRPFDICRPVCGFVYGFNIHKTKGRTVSVPLICNPSKGKSWTLPRVKTNRTIITSYFGYDPicXDKEFKLLCMTRSKFGFF--EEHQVLTLATGKKLSWRKIECDMAHSPCPcpveGEAGHNYPLYDGICINGVLYYLGMVF----DGFPDIICFDIKSEKFSYAKKAHGME---LNSGSKLQPTLVNYKGKIAKFQPNFNPDYTliTGIQLWVLEDAEKHQWSSYIYVMPPPWKDIIEETKLRFVGASDTGEIVLSPYNisESDSSYLLYYDPERNTMTRVGIQGMEALKSHKAYAFLDHVKNIS-------
>V4L0Y9	322	0.625	4.670E-93	0	403	410	0	361	364
MMPRRKTRSF------LAKIEPIPFDLVIEILLRLPVKSIATFRRVSKLWASTLRDPSFTESYLTISSSRKKLLFTCLKDDETCFFsSSPNSQSPSSdISAKVHMSFPINCPTNICRPVRGLVCGLNQrRPSKGRTVTVPLICNPSTGQSLALPDVRTRGKRVISCFGYDPIDKQFKVF-----------SQDHQVLTLGTQKKPSWKMIKCEVPHIPVDFEHTNG-----GVCINGVLYYLAILLHVDAYTDGY---FDIG-EK--------------------LESTLVNYKGKLAKLQRNIDDYgTYTGIQLWVLEDAEKHEWSSYIYVLPPPWKNIFEETTLCFVGTTSKGEIVLSPNTISDSFYLLYHNPETKTITKVGVQGMEAYKGHKAYTFLDHVEDVTL------
>A0A5S9X2X0	321	0.449	1.193E-92	15	404	410	17	386	387
...
...
>MGYP000274109773	42	0.261	8.283E+00	22	61	410	15	56	204
----------------------LPFELACQILtsEHLDAMSLVRSSQVCKSWKQMCDNDEIWRK------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>M4FBP7	42	0.350	8.283E+00	8	79	410	34	110	293
--------SKQKNTSDETNSDPFPSDLLMEILKLFPVKTLARLTCVSKLWASTIRRqefNKLWSssNQQRRSSSSNTLIFAFKRD------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

My question is how to interpret the above content returned after using the run_mmseqs2 function to call the interface?

zxzhang8 avatar Jul 06 '22 03:07 zxzhang8