Issue with Cyclic Peptide Generation
Hello, I am a student currently studying protein design in South Korea, and I have been looking into the updated RFpeptide binder design functionality—particularly the generation of cyclic peptides.
When I ran the example script examples/design_macrocyclic_binder.sh, the outputs appeared to be linear peptides, not cyclic ones.
I would be grateful if you could advise whether I may be overlooking any required configurations for cyclic peptide design.
Thank you very much for your time and support.
Hello,
Could you provide your output and any log files?
Running the macrocyclic binder example appears to have generated macrocycles on my machine. (Files attached.) The only modification I made to the example was to reduce the number of designs from 10 to 4.
Hello, thank you for your kind and helpful responses. I ran the example script examples/design_macrocyclic_binder.sh and have attached the output PDB file and log for your review. I would appreciate it if you could take a look. Thank you!
Hello, I am struggling the same issue, and when my script was python ./scripts/run_inference.py inference.output_prefix=cyclic_design/cyclic inference.num_designs=5 contigmap.contigs=[\"8-8\"] inference.cyclic=True inference.cyc_chains="a" diffuser.T=50 then I got that errors:
Could not override 'inference.cyclic'.
To append to your config use +inference.cyclic=True
Key 'cyclic' is not in struct
full_key: inference.cyclic
object_type=dict
Could not override 'inference.cyc_chains'.
To append to your config use +inference.cyc_chains=a
Key 'cyc_chains' is not in struct
full_key: inference.cyc_chains
object_type=dict
Program ran properly when I re-wrote into "+inference.cyclic" and "+inference.cyc_chains", but the results were still non-cyclic.
Have you looked at this issue: Issue 359 - it is discussing the same thing.
Have you looked at this issue: Issue 359 - it is discussing the same thing.
Yes, I've read this thread, however it seems like no solution work for me :<
Hello,
Could you provide your output and any log files?
Running the macrocyclic binder example appears to have generated macrocycles on my machine. (Files attached.) The only modification I made to the example was to reduce the number of designs from 10 to 4.
[rfpeptide_test.zip](https://github.com/user-attachments/files/21168711/rfpeptide_test.zip)
Hi Rachel,
I downloaded your attachment and noticed that the bonding seems unusual, as shown in the figure. Is the file directly compatible with ProteinMPNN?
Additionally, when I set diffuser.T to 15, the output is no longer a cyclic peptide. Could you advise on a suitable value of T for generating cyclic peptides?
the bonding seems unusual, as shown in the figure.
Keep in mind that the PDB format doesn't really encode bonds. What's happening here is that PyMol (and other such structural viewing programs) infer bonding from atom proximity. In your figure the oxygens and nitrogens are close enough that they're triggering the "it's bonded" heuristic for PyMol, which is why it's displayed as a bond. But there isn't actually any bond annotation in the PDB file, or at least not any which will trip up downstream programs like ProteinMPNN.
That said, the fact that the backbone geometry is wrenched enough that the oxygen and the nitrogen are that close is probably not the best. If you have a large number of structures, only some of which have this "bonded" configuration, it may be worth pre-filtering the RFdiffusion outputs to eliminate these wrenched structures.
when I set diffuser.T to 15, the output is no longer a cyclic peptide.
As mentioned in A note on diffuser.T, RFdiffusion was initially trained on 200 timesteps. However, it was observed that (for most cases, at least for "standard" proteins) you can reduce that to 50 timesteps without much loss in quality, which is why it's the new default. Remember that the diffusion process is an iterative one, where you progressively remove "noise" from the structure. The number of timesteps are the number of times you're removing noise -- but since it's the same amount of noise, fewer timesteps means that you have to remove a larger amount of noise with each timestep. At a certain point, RFdiffusion can't effectively remove all the noise needed.
So it's a tradeoff between how long you're willing to let the program run and how effectively you want to denoise things. You're certainly not going to get bad structures by increasing the diffuser.T setting - it will just take longer to run each output model. So if you're getting bad geometries with a low diffuser.T setting, I'd recommend increasing it and see what effect it has. If diffuser.T=15 isn't working for you, I might try 30 and see how things look. If they're still bad, try 50. If they're looking good at 30, you might be able to speed things up by dropping down to 25 or 20 while still having decent results. (To some extent it's mostly about how much time you want to spend optimizing the setting versus how much time it will take just to produce the models with a larger diffuser.T setting.)
P.S. It could be that the bad geometry/"bonds" you're seeing in the figure is a symptom of a low diffuser.T setting -- RFdiffusion isn't able to properly correct the geometry because it's trying to do too much in each step.
the bonding seems unusual, as shown in the figure.
Keep in mind that the PDB format doesn't really encode bonds. What's happening here is that PyMol (and other such structural viewing programs) infer bonding from atom proximity. In your figure the oxygens and nitrogens are close enough that they're triggering the "it's bonded" heuristic for PyMol, which is why it's displayed as a bond. But there isn't actually any bond annotation in the PDB file, or at least not any which will trip up downstream programs like ProteinMPNN.
That said, the fact that the backbone geometry is wrenched enough that the oxygen and the nitrogen are that close is probably not the best. If you have a large number of structures, only some of which have this "bonded" configuration, it may be worth pre-filtering the RFdiffusion outputs to eliminate these wrenched structures.
when I set diffuser.T to 15, the output is no longer a cyclic peptide.
As mentioned in A note on diffuser.T, RFdiffusion was initially trained on 200 timesteps. However, it was observed that (for most cases, at least for "standard" proteins) you can reduce that to 50 timesteps without much loss in quality, which is why it's the new default. Remember that the diffusion process is an iterative one, where you progressively remove "noise" from the structure. The number of timesteps are the number of times you're removing noise -- but since it's the same amount of noise, fewer timesteps means that you have to remove a larger amount of noise with each timestep. At a certain point, RFdiffusion can't effectively remove all the noise needed.
So it's a tradeoff between how long you're willing to let the program run and how effectively you want to denoise things. You're certainly not going to get bad structures by increasing the diffuser.T setting - it will just take longer to run each output model. So if you're getting bad geometries with a low diffuser.T setting, I'd recommend increasing it and see what effect it has. If diffuser.T=15 isn't working for you, I might try 30 and see how things look. If they're still bad, try 50. If they're looking good at 30, you might be able to speed things up by dropping down to 25 or 20 while still having decent results. (To some extent it's mostly about how much time you want to spend optimizing the setting versus how much time it will take just to produce the models with a larger diffuser.T setting.)
P.S. It could be that the bad geometry/"bonds" you're seeing in the figure is a symptom of a low diffuser.T setting -- RFdiffusion isn't able to properly correct the geometry because it's trying to do too much in each step.
Hi Rocco, Thanks a lot for your reply! Your explanation really cleared things up for me.
the bonding seems unusual, as shown in the figure.
Keep in mind that the PDB format doesn't really encode bonds. What's happening here is that PyMol (and other such structural viewing programs) infer bonding from atom proximity. In your figure the oxygens and nitrogens are close enough that they're triggering the "it's bonded" heuristic for PyMol, which is why it's displayed as a bond. But there isn't actually any bond annotation in the PDB file, or at least not any which will trip up downstream programs like ProteinMPNN. That said, the fact that the backbone geometry is wrenched enough that the oxygen and the nitrogen are that close is probably not the best. If you have a large number of structures, only some of which have this "bonded" configuration, it may be worth pre-filtering the RFdiffusion outputs to eliminate these wrenched structures.
when I set diffuser.T to 15, the output is no longer a cyclic peptide.
As mentioned in A note on diffuser.T, RFdiffusion was initially trained on 200 timesteps. However, it was observed that (for most cases, at least for "standard" proteins) you can reduce that to 50 timesteps without much loss in quality, which is why it's the new default. Remember that the diffusion process is an iterative one, where you progressively remove "noise" from the structure. The number of timesteps are the number of times you're removing noise -- but since it's the same amount of noise, fewer timesteps means that you have to remove a larger amount of noise with each timestep. At a certain point, RFdiffusion can't effectively remove all the noise needed. So it's a tradeoff between how long you're willing to let the program run and how effectively you want to denoise things. You're certainly not going to get bad structures by increasing the diffuser.T setting - it will just take longer to run each output model. So if you're getting bad geometries with a low diffuser.T setting, I'd recommend increasing it and see what effect it has. If diffuser.T=15 isn't working for you, I might try 30 and see how things look. If they're still bad, try 50. If they're looking good at 30, you might be able to speed things up by dropping down to 25 or 20 while still having decent results. (To some extent it's mostly about how much time you want to spend optimizing the setting versus how much time it will take just to produce the models with a larger diffuser.T setting.) P.S. It could be that the bad geometry/"bonds" you're seeing in the figure is a symptom of a low diffuser.T setting -- RFdiffusion isn't able to properly correct the geometry because it's trying to do too much in each step.
Hi Rocco, Thanks a lot for your reply! Your explanation really cleared things up for me.
Hi Nicole, did you solve the 'unusual bond" problem? I am practicing RFpeptide protocol with my target and get the same issue as you.
the bonding seems unusual, as shown in the figure.
Keep in mind that the PDB format doesn't really encode bonds. What's happening here is that PyMol (and other such structural viewing programs) infer bonding from atom proximity. In your figure the oxygens and nitrogens are close enough that they're triggering the "it's bonded" heuristic for PyMol, which is why it's displayed as a bond. But there isn't actually any bond annotation in the PDB file, or at least not any which will trip up downstream programs like ProteinMPNN. That said, the fact that the backbone geometry is wrenched enough that the oxygen and the nitrogen are that close is probably not the best. If you have a large number of structures, only some of which have this "bonded" configuration, it may be worth pre-filtering the RFdiffusion outputs to eliminate these wrenched structures.
when I set diffuser.T to 15, the output is no longer a cyclic peptide.
As mentioned in A note on diffuser.T, RFdiffusion was initially trained on 200 timesteps. However, it was observed that (for most cases, at least for "standard" proteins) you can reduce that to 50 timesteps without much loss in quality, which is why it's the new default. Remember that the diffusion process is an iterative one, where you progressively remove "noise" from the structure. The number of timesteps are the number of times you're removing noise -- but since it's the same amount of noise, fewer timesteps means that you have to remove a larger amount of noise with each timestep. At a certain point, RFdiffusion can't effectively remove all the noise needed. So it's a tradeoff between how long you're willing to let the program run and how effectively you want to denoise things. You're certainly not going to get bad structures by increasing the diffuser.T setting - it will just take longer to run each output model. So if you're getting bad geometries with a low diffuser.T setting, I'd recommend increasing it and see what effect it has. If diffuser.T=15 isn't working for you, I might try 30 and see how things look. If they're still bad, try 50. If they're looking good at 30, you might be able to speed things up by dropping down to 25 or 20 while still having decent results. (To some extent it's mostly about how much time you want to spend optimizing the setting versus how much time it will take just to produce the models with a larger diffuser.T setting.) P.S. It could be that the bad geometry/"bonds" you're seeing in the figure is a symptom of a low diffuser.T setting -- RFdiffusion isn't able to properly correct the geometry because it's trying to do too much in each step.
Hi Rocco, Thanks a lot for your reply! Your explanation really cleared things up for me.
Hi Nicole, did you solve the 'unusual bond" problem? I am practicing RFpeptide protocol with my target and get the same issue as you.
Hi DazLe, you can check Rocco’s answer above. For me, I just used a higher diffuser.T value.
the bonding seems unusual, as shown in the figure.
Keep in mind that the PDB format doesn't really encode bonds. What's happening here is that PyMol (and other such structural viewing programs) infer bonding from atom proximity. In your figure the oxygens and nitrogens are close enough that they're triggering the "it's bonded" heuristic for PyMol, which is why it's displayed as a bond. But there isn't actually any bond annotation in the PDB file, or at least not any which will trip up downstream programs like ProteinMPNN. That said, the fact that the backbone geometry is wrenched enough that the oxygen and the nitrogen are that close is probably not the best. If you have a large number of structures, only some of which have this "bonded" configuration, it may be worth pre-filtering the RFdiffusion outputs to eliminate these wrenched structures.
when I set diffuser.T to 15, the output is no longer a cyclic peptide.
As mentioned in A note on diffuser.T, RFdiffusion was initially trained on 200 timesteps. However, it was observed that (for most cases, at least for "standard" proteins) you can reduce that to 50 timesteps without much loss in quality, which is why it's the new default. Remember that the diffusion process is an iterative one, where you progressively remove "noise" from the structure. The number of timesteps are the number of times you're removing noise -- but since it's the same amount of noise, fewer timesteps means that you have to remove a larger amount of noise with each timestep. At a certain point, RFdiffusion can't effectively remove all the noise needed. So it's a tradeoff between how long you're willing to let the program run and how effectively you want to denoise things. You're certainly not going to get bad structures by increasing the diffuser.T setting - it will just take longer to run each output model. So if you're getting bad geometries with a low diffuser.T setting, I'd recommend increasing it and see what effect it has. If diffuser.T=15 isn't working for you, I might try 30 and see how things look. If they're still bad, try 50. If they're looking good at 30, you might be able to speed things up by dropping down to 25 or 20 while still having decent results. (To some extent it's mostly about how much time you want to spend optimizing the setting versus how much time it will take just to produce the models with a larger diffuser.T setting.) P.S. It could be that the bad geometry/"bonds" you're seeing in the figure is a symptom of a low diffuser.T setting -- RFdiffusion isn't able to properly correct the geometry because it's trying to do too much in each step.
Hi Rocco, Thanks a lot for your reply! Your explanation really cleared things up for me.
Hi Nicole, did you solve the 'unusual bond" problem? I am practicing RFpeptide protocol with my target and get the same issue as you.
Hi DazLe, you can check Rocco’s answer above. For me, I just used a higher diffuser.T value.
Hi Nicole, can you share which diffuser.T value that you've used, please?
[rfpeptide_test.zip](https://github.com/user-attachments/files/21168711/rfpeptide_test.zip)