model-angelo config.json file

Hi, in config.json file, I don't quite understand the meaning of "crop": 6 in dict("ca_infer_args") and "crop_length": 200, "aggressive_pruning": false, in dict("gnn_infer_args":). Could you please give me some hints? Also, if I would like to prune some short chains. how can I set up the threshold?

Thank you!

Jan 24 '24 03:01 chenwei-zhang

Hi!

The meaning of crop can be found in model_angelo/c_alpha/inference.py. Basically, since we are doing inference on boxes of 64 voxels across, the voxels near the edge of each box do not have information about the biological context around them. So, you would expect results of these voxels for segmentation to be worse. The crop argument controls how many voxels from each side to ignore during inference for each box.
crop_length in the GNN is different. Since there are GPU memory constraints for the majority of our users, ModelAngelo will only look at groups of residues of length crop_length that are close together.
aggressive_pruning is an algorithm that prunes residues that are not found in the sequence file.

Could you clarify what you mean about pruning the short chains? Would you like to prune all chains shorted than N residues from the output CIF file? I don't believe there is such an option but it would be simple for me to write a quick python script for that if you like.

Best, Kiarash

Jan 24 '24 14:01 jamaliki

Hi Kiarash, Thanks for your rapid reply.

I I would like to generate the structures in the ModelAngelo ICLR paper. In the paper you compared pruned and unpruned predictions, I am wondering how do you do this pruning? I found a sentence in the paper saying "chains shorter than 4 residues are pruned and the resulting coordinates are used as the input". May I ask if this is corresponding to the pruned prediction? Is there any way I could customize the cutting threshold 4 residues? And if I directly use the latest version of ModelAngelo without changing any configuration, will this generate the pruned or unpruned structures?
For the results you show in the paper, may I ask if you use the original map (e.g. emd_26126.map) as the input for inference directly, or you use the postprocessed map? If latter, could you give some details how you postprocess the map?

Sorry for so many questions, but ModelAngelo is an awesome work and I really appreciate. Thank you in advance for your answers.

Best, Chenwei

Jan 24 '24 19:01 chenwei-zhang

Hi @chenwei-zhang ,

So pruned and unpruned refers to the output files. The pruned file is output.cif and the unpruned file is output_raw.cif. It is not quite clean if you want to change the cutting threshold, although I can point you to where in the code you would have to change if you like? Specifically, if you go to the class MatchToSequence, you will find the bulk of the pruning code. I'm sorry it is not very clean :disappointed:
We always used post-processed files as deposited in the EMDB. For example, for EMD-26126, it would be this file: emdb_26126.map. The only post-processing ModelAngelo will really do is to change the pixel size.

Feb 08 '24 16:02 jamaliki

model-angelo model-angelo copied to clipboard

config.json file

model-angelo
model-angelo copied to clipboard