RFdiffusion icon indicating copy to clipboard operation
RFdiffusion copied to clipboard

Issue with Cyclic Peptide Generation

Open kimlab-cnu opened this issue 6 months ago • 11 comments

Hello, I am a student currently studying protein design in South Korea, and I have been looking into the updated RFpeptide binder design functionality—particularly the generation of cyclic peptides.

When I ran the example script examples/design_macrocyclic_binder.sh, the outputs appeared to be linear peptides, not cyclic ones.

I would be grateful if you could advise whether I may be overlooking any required configurations for cyclic peptide design.

Thank you very much for your time and support.

Image

kimlab-cnu avatar Jul 10 '25 05:07 kimlab-cnu

Hello,

Could you provide your output and any log files?

Running the macrocyclic binder example appears to have generated macrocycles on my machine. (Files attached.) The only modification I made to the example was to reduce the number of designs from 10 to 4.

Image

rfpeptide_test.zip

rclune avatar Jul 10 '25 18:07 rclune

cyclic_test.zip

Hello, thank you for your kind and helpful responses. I ran the example script examples/design_macrocyclic_binder.sh and have attached the output PDB file and log for your review. I would appreciate it if you could take a look. Thank you!

kimlab-cnu avatar Jul 11 '25 02:07 kimlab-cnu

Hello, I am struggling the same issue, and when my script was python ./scripts/run_inference.py inference.output_prefix=cyclic_design/cyclic inference.num_designs=5 contigmap.contigs=[\"8-8\"] inference.cyclic=True inference.cyc_chains="a" diffuser.T=50 then I got that errors: Could not override 'inference.cyclic'. To append to your config use +inference.cyclic=True Key 'cyclic' is not in struct full_key: inference.cyclic object_type=dict Could not override 'inference.cyc_chains'. To append to your config use +inference.cyc_chains=a Key 'cyc_chains' is not in struct full_key: inference.cyc_chains object_type=dict

Program ran properly when I re-wrote into "+inference.cyclic" and "+inference.cyc_chains", but the results were still non-cyclic.

DazLe-Q avatar Jul 11 '25 17:07 DazLe-Q

Have you looked at this issue: Issue 359 - it is discussing the same thing.

rclune avatar Jul 11 '25 20:07 rclune

Have you looked at this issue: Issue 359 - it is discussing the same thing.

Yes, I've read this thread, however it seems like no solution work for me :<

DazLe-Q avatar Jul 12 '25 02:07 DazLe-Q

Hello,

Could you provide your output and any log files?

Running the macrocyclic binder example appears to have generated macrocycles on my machine. (Files attached.) The only modification I made to the example was to reduce the number of designs from 10 to 4.

Image [rfpeptide_test.zip](https://github.com/user-attachments/files/21168711/rfpeptide_test.zip)

Hi Rachel,

I downloaded your attachment and noticed that the bonding seems unusual, as shown in the figure. Is the file directly compatible with ProteinMPNN?

Image

Additionally, when I set diffuser.T to 15, the output is no longer a cyclic peptide. Could you advise on a suitable value of T for generating cyclic peptides?

Nicole-DH avatar Aug 01 '25 05:08 Nicole-DH

the bonding seems unusual, as shown in the figure.

Keep in mind that the PDB format doesn't really encode bonds. What's happening here is that PyMol (and other such structural viewing programs) infer bonding from atom proximity. In your figure the oxygens and nitrogens are close enough that they're triggering the "it's bonded" heuristic for PyMol, which is why it's displayed as a bond. But there isn't actually any bond annotation in the PDB file, or at least not any which will trip up downstream programs like ProteinMPNN.

That said, the fact that the backbone geometry is wrenched enough that the oxygen and the nitrogen are that close is probably not the best. If you have a large number of structures, only some of which have this "bonded" configuration, it may be worth pre-filtering the RFdiffusion outputs to eliminate these wrenched structures.

when I set diffuser.T to 15, the output is no longer a cyclic peptide.

As mentioned in A note on diffuser.T, RFdiffusion was initially trained on 200 timesteps. However, it was observed that (for most cases, at least for "standard" proteins) you can reduce that to 50 timesteps without much loss in quality, which is why it's the new default. Remember that the diffusion process is an iterative one, where you progressively remove "noise" from the structure. The number of timesteps are the number of times you're removing noise -- but since it's the same amount of noise, fewer timesteps means that you have to remove a larger amount of noise with each timestep. At a certain point, RFdiffusion can't effectively remove all the noise needed.

So it's a tradeoff between how long you're willing to let the program run and how effectively you want to denoise things. You're certainly not going to get bad structures by increasing the diffuser.T setting - it will just take longer to run each output model. So if you're getting bad geometries with a low diffuser.T setting, I'd recommend increasing it and see what effect it has. If diffuser.T=15 isn't working for you, I might try 30 and see how things look. If they're still bad, try 50. If they're looking good at 30, you might be able to speed things up by dropping down to 25 or 20 while still having decent results. (To some extent it's mostly about how much time you want to spend optimizing the setting versus how much time it will take just to produce the models with a larger diffuser.T setting.)

P.S. It could be that the bad geometry/"bonds" you're seeing in the figure is a symptom of a low diffuser.T setting -- RFdiffusion isn't able to properly correct the geometry because it's trying to do too much in each step.

roccomoretti avatar Aug 01 '25 15:08 roccomoretti

the bonding seems unusual, as shown in the figure.

Keep in mind that the PDB format doesn't really encode bonds. What's happening here is that PyMol (and other such structural viewing programs) infer bonding from atom proximity. In your figure the oxygens and nitrogens are close enough that they're triggering the "it's bonded" heuristic for PyMol, which is why it's displayed as a bond. But there isn't actually any bond annotation in the PDB file, or at least not any which will trip up downstream programs like ProteinMPNN.

That said, the fact that the backbone geometry is wrenched enough that the oxygen and the nitrogen are that close is probably not the best. If you have a large number of structures, only some of which have this "bonded" configuration, it may be worth pre-filtering the RFdiffusion outputs to eliminate these wrenched structures.

when I set diffuser.T to 15, the output is no longer a cyclic peptide.

As mentioned in A note on diffuser.T, RFdiffusion was initially trained on 200 timesteps. However, it was observed that (for most cases, at least for "standard" proteins) you can reduce that to 50 timesteps without much loss in quality, which is why it's the new default. Remember that the diffusion process is an iterative one, where you progressively remove "noise" from the structure. The number of timesteps are the number of times you're removing noise -- but since it's the same amount of noise, fewer timesteps means that you have to remove a larger amount of noise with each timestep. At a certain point, RFdiffusion can't effectively remove all the noise needed.

So it's a tradeoff between how long you're willing to let the program run and how effectively you want to denoise things. You're certainly not going to get bad structures by increasing the diffuser.T setting - it will just take longer to run each output model. So if you're getting bad geometries with a low diffuser.T setting, I'd recommend increasing it and see what effect it has. If diffuser.T=15 isn't working for you, I might try 30 and see how things look. If they're still bad, try 50. If they're looking good at 30, you might be able to speed things up by dropping down to 25 or 20 while still having decent results. (To some extent it's mostly about how much time you want to spend optimizing the setting versus how much time it will take just to produce the models with a larger diffuser.T setting.)

P.S. It could be that the bad geometry/"bonds" you're seeing in the figure is a symptom of a low diffuser.T setting -- RFdiffusion isn't able to properly correct the geometry because it's trying to do too much in each step.

Hi Rocco, Thanks a lot for your reply! Your explanation really cleared things up for me.

Nicole-DH avatar Aug 04 '25 01:08 Nicole-DH

the bonding seems unusual, as shown in the figure.

Keep in mind that the PDB format doesn't really encode bonds. What's happening here is that PyMol (and other such structural viewing programs) infer bonding from atom proximity. In your figure the oxygens and nitrogens are close enough that they're triggering the "it's bonded" heuristic for PyMol, which is why it's displayed as a bond. But there isn't actually any bond annotation in the PDB file, or at least not any which will trip up downstream programs like ProteinMPNN. That said, the fact that the backbone geometry is wrenched enough that the oxygen and the nitrogen are that close is probably not the best. If you have a large number of structures, only some of which have this "bonded" configuration, it may be worth pre-filtering the RFdiffusion outputs to eliminate these wrenched structures.

when I set diffuser.T to 15, the output is no longer a cyclic peptide.

As mentioned in A note on diffuser.T, RFdiffusion was initially trained on 200 timesteps. However, it was observed that (for most cases, at least for "standard" proteins) you can reduce that to 50 timesteps without much loss in quality, which is why it's the new default. Remember that the diffusion process is an iterative one, where you progressively remove "noise" from the structure. The number of timesteps are the number of times you're removing noise -- but since it's the same amount of noise, fewer timesteps means that you have to remove a larger amount of noise with each timestep. At a certain point, RFdiffusion can't effectively remove all the noise needed. So it's a tradeoff between how long you're willing to let the program run and how effectively you want to denoise things. You're certainly not going to get bad structures by increasing the diffuser.T setting - it will just take longer to run each output model. So if you're getting bad geometries with a low diffuser.T setting, I'd recommend increasing it and see what effect it has. If diffuser.T=15 isn't working for you, I might try 30 and see how things look. If they're still bad, try 50. If they're looking good at 30, you might be able to speed things up by dropping down to 25 or 20 while still having decent results. (To some extent it's mostly about how much time you want to spend optimizing the setting versus how much time it will take just to produce the models with a larger diffuser.T setting.) P.S. It could be that the bad geometry/"bonds" you're seeing in the figure is a symptom of a low diffuser.T setting -- RFdiffusion isn't able to properly correct the geometry because it's trying to do too much in each step.

Hi Rocco, Thanks a lot for your reply! Your explanation really cleared things up for me.

Hi Nicole, did you solve the 'unusual bond" problem? I am practicing RFpeptide protocol with my target and get the same issue as you.

DazLe-Q avatar Sep 17 '25 16:09 DazLe-Q

the bonding seems unusual, as shown in the figure.

Keep in mind that the PDB format doesn't really encode bonds. What's happening here is that PyMol (and other such structural viewing programs) infer bonding from atom proximity. In your figure the oxygens and nitrogens are close enough that they're triggering the "it's bonded" heuristic for PyMol, which is why it's displayed as a bond. But there isn't actually any bond annotation in the PDB file, or at least not any which will trip up downstream programs like ProteinMPNN. That said, the fact that the backbone geometry is wrenched enough that the oxygen and the nitrogen are that close is probably not the best. If you have a large number of structures, only some of which have this "bonded" configuration, it may be worth pre-filtering the RFdiffusion outputs to eliminate these wrenched structures.

when I set diffuser.T to 15, the output is no longer a cyclic peptide.

As mentioned in A note on diffuser.T, RFdiffusion was initially trained on 200 timesteps. However, it was observed that (for most cases, at least for "standard" proteins) you can reduce that to 50 timesteps without much loss in quality, which is why it's the new default. Remember that the diffusion process is an iterative one, where you progressively remove "noise" from the structure. The number of timesteps are the number of times you're removing noise -- but since it's the same amount of noise, fewer timesteps means that you have to remove a larger amount of noise with each timestep. At a certain point, RFdiffusion can't effectively remove all the noise needed. So it's a tradeoff between how long you're willing to let the program run and how effectively you want to denoise things. You're certainly not going to get bad structures by increasing the diffuser.T setting - it will just take longer to run each output model. So if you're getting bad geometries with a low diffuser.T setting, I'd recommend increasing it and see what effect it has. If diffuser.T=15 isn't working for you, I might try 30 and see how things look. If they're still bad, try 50. If they're looking good at 30, you might be able to speed things up by dropping down to 25 or 20 while still having decent results. (To some extent it's mostly about how much time you want to spend optimizing the setting versus how much time it will take just to produce the models with a larger diffuser.T setting.) P.S. It could be that the bad geometry/"bonds" you're seeing in the figure is a symptom of a low diffuser.T setting -- RFdiffusion isn't able to properly correct the geometry because it's trying to do too much in each step.

Hi Rocco, Thanks a lot for your reply! Your explanation really cleared things up for me.

Hi Nicole, did you solve the 'unusual bond" problem? I am practicing RFpeptide protocol with my target and get the same issue as you.

Hi DazLe, you can check Rocco’s answer above. For me, I just used a higher diffuser.T value.

Nicole-DH avatar Oct 20 '25 08:10 Nicole-DH

the bonding seems unusual, as shown in the figure.

Keep in mind that the PDB format doesn't really encode bonds. What's happening here is that PyMol (and other such structural viewing programs) infer bonding from atom proximity. In your figure the oxygens and nitrogens are close enough that they're triggering the "it's bonded" heuristic for PyMol, which is why it's displayed as a bond. But there isn't actually any bond annotation in the PDB file, or at least not any which will trip up downstream programs like ProteinMPNN. That said, the fact that the backbone geometry is wrenched enough that the oxygen and the nitrogen are that close is probably not the best. If you have a large number of structures, only some of which have this "bonded" configuration, it may be worth pre-filtering the RFdiffusion outputs to eliminate these wrenched structures.

when I set diffuser.T to 15, the output is no longer a cyclic peptide.

As mentioned in A note on diffuser.T, RFdiffusion was initially trained on 200 timesteps. However, it was observed that (for most cases, at least for "standard" proteins) you can reduce that to 50 timesteps without much loss in quality, which is why it's the new default. Remember that the diffusion process is an iterative one, where you progressively remove "noise" from the structure. The number of timesteps are the number of times you're removing noise -- but since it's the same amount of noise, fewer timesteps means that you have to remove a larger amount of noise with each timestep. At a certain point, RFdiffusion can't effectively remove all the noise needed. So it's a tradeoff between how long you're willing to let the program run and how effectively you want to denoise things. You're certainly not going to get bad structures by increasing the diffuser.T setting - it will just take longer to run each output model. So if you're getting bad geometries with a low diffuser.T setting, I'd recommend increasing it and see what effect it has. If diffuser.T=15 isn't working for you, I might try 30 and see how things look. If they're still bad, try 50. If they're looking good at 30, you might be able to speed things up by dropping down to 25 or 20 while still having decent results. (To some extent it's mostly about how much time you want to spend optimizing the setting versus how much time it will take just to produce the models with a larger diffuser.T setting.) P.S. It could be that the bad geometry/"bonds" you're seeing in the figure is a symptom of a low diffuser.T setting -- RFdiffusion isn't able to properly correct the geometry because it's trying to do too much in each step.

Hi Rocco, Thanks a lot for your reply! Your explanation really cleared things up for me.

Hi Nicole, did you solve the 'unusual bond" problem? I am practicing RFpeptide protocol with my target and get the same issue as you.

Hi DazLe, you can check Rocco’s answer above. For me, I just used a higher diffuser.T value.

Hi Nicole, can you share which diffuser.T value that you've used, please?

DazLe-Q avatar Oct 25 '25 07:10 DazLe-Q