ColabFold
ColabFold copied to clipboard
can colabfold be used to predict loops while leaving the remaining of the protein untouched?
i have a protein structure with some loops missing amino acids. Is there a way to use colabfold to model these loops?
you can use our advanced notebook for this!
https://colab.research.google.com/github/sokrypton/ColabDesign/blob/gamma/af/examples/predict.ipynb
msa_method=single_sequence template_mode=custom
If the sequence of the template matches the query, and no msa is used, then the output should copy the template in regions that are defined.
Thank you for your help. I checked the notebook. Just to be sure, my case include a homo tetramer with the same loop missing in each copy. So should I add the sequence 4 times and separate them using : in addition to the settings you sent?
Yahoo Mail: Search, organise, conquer
On Thu, 28 Mar 2024 at 5:58 pm, Sergey @.***> wrote:
you can use our advanced notebook for this!
https://colab.research.google.com/github/sokrypton/ColabDesign/blob/gamma/af/examples/predict.ipynb
msa_method=single_sequence template_mode=custom
If the sequence of the template matches the query, and no msa is used, then the output should copy the template in regions that are defined.
— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>
In your case, just set copies=4, propagate_to_copies=False and list all chains from the pdb out. See end of notebook for instructions.
i have done as mentioned in the instructions, however it produced an error related to the pdb file.
---------------------------------------------------------------------------
FileNotFoundError Traceback (most recent call last)
[<ipython-input-2-23535cc0c9af>](https://localhost:8080/#) in <cell line: 171>()
174 for pdb,chain in zip(pdbs,chains):
175 query_seq = "".join(u_sequences)
--> 176 batch = predict.get_template_feats(pdb, chain,
177 query_seq=query_seq,
178 query_a3m=template_msa,
2 frames
[/content/colabdesign/af/contrib/predict.py](https://localhost:8080/#) in get_template_feats(pdbs, chains, query_seq, query_a3m, copies, propagate_to_copies, use_seq, use_dgram, get_pdb_fn, align_fn)
113 if isinstance(chain,str): chain = chain.split(",")
114 for c in chain:
--> 115 info = prep_pdb(pdb_filename, c, ignore_missing=True)
116 N.append(n)
117 X.append(info)
[/content/colabdesign/af/prep.py](https://localhost:8080/#) in prep_pdb(pdb_filename, chain, offsets, lengths, ignore_missing, offset_index, auth_chains)
429 # go through each defined chain
430 for n,chain in enumerate(chains):
--> 431 pdb_str = pdb_to_string(pdb_filename, chains=chain, models=[1], auth_chains=auth_chains)
432 protein_obj = protein.from_pdb_string(pdb_str) #, chain_id=chain)
433 batch = {'aatype': protein_obj.aatype,
[/content/colabdesign/shared/protein.py](https://localhost:8080/#) in pdb_to_string(pdb_file, chains, models, auth_chains)
181 old_lines = pdb_file.split("\n")
182 else:
--> 183 with open(pdb_file,"rb") as f:
184 old_lines = [line.decode("utf-8","ignore").rstrip() for line in f]
185 for line in old_lines:
FileNotFoundError: [Errno 2] No such file or directory: 'tmp/AF-tetramer-F1-model_v4.pdb'
I have written the name of the pdb file of the homotetramer (tetramer) in the pdb option and wrote the 4 chain as A,B,C,D.
These are the options i used:
msa_method: single_sequence
pair_mode: unpaired_paired
filtering options (left the same as default)
template_mode: custom
pdb: tetramer
chain: A,B,C,D
think I have seen a similar FileNotFoundError when the template file name contained upper case letters. Can you change to lowercase? Maybe also you need to rename to four characters eg 1xxx.pdb, but not sure of that.
the error was solved by doing this
!cp tetramer.pdb tmp/AF-tetramer-F1-model_v4.pdb
i copied and renamed my pdb file to the tmp folder.
However, now after running the cell, while setting copies = 4 it crashed due to memory. Therefore, i tried to reduce it to 2 and it ran without error, however the results modified the whole protein.
the green and cyan cartoon are the original protein while the magenta is the one produced from alphafold. As can be seen the relative position of the second monomer is different compared to the original structure
Can you share a screenshot of the template features (should appear after prep_inputs cells)?
- if you say "tetramer" it will try download a protein by name of "tetramer" from alphafolddb. leave this blank (or provide the actual path to the pdb of interest). if it's blank, you'll get a prompt for upload.
- make sure you set propagate_to_copies=False (otherwise it will take the first chain, and provide it as independent template for all other copies, ignoring any interchain info)
this is the template features and the propagate_to_copies selection was not selected:
Looks like only features for chain A were loaded. Did you set: chain="A,B"?
yes. i ran the cells again but gave me the same plot: