pdb-tools Improve `pdb_merge` to include `TER`s and `END` lines

Closes #149

Nov 04 '22 12:11 joaomcteixeira

That's right. However, I feel we are adding too much to the purpose of merge. All that should be done by tidy. Maybe if we exchange the -strict option for tidy we don't need to touch the merge. What do you think? Because the merge was really designed to be a concatenator

Nov 08 '22 08:11 joaomcteixeira

Yes - but why does it remove the END statements then? Other code don’t do that (e.g. pdb_chain).

Nov 08 '22 08:11 amjjbonvin

The current version of pdb_merge does not remove any lines from the input files. It's a straight concatenation.

https://github.com/haddocking/pdb-tools/blob/2a070bbacee9d6608b44bb6d2f749beefd6a7690/pdbtools/pdb_merge.py#L84-L87

Nov 08 '22 08:11 joaomcteixeira

So a simple cat command effectively.

Meaning pdb_tidy should always be run to correct things.

A very different behaviour than pdb_mkensemble for example.

Nov 08 '22 10:11 amjjbonvin

Yes. mkensemble has the clear purpose of making a correct ensemble of structures. While merge assumes the user knows what he/she is doing when concatenating the files. Likely the user cleaned the PDBs before using merge.

Nov 08 '22 11:11 joaomcteixeira

hmmm…. assumptions… this is asking for troubles…

It is this make clear to the users?

Yes. mkensemble has the clear purpose of making a correct ensemble of structures. While merge assumes the user knows what he/she is doing when concatenating the files. Likely the user cleaned the PDBs before using merge.

Nov 08 '22 11:11 amjjbonvin

Good point. Let's first clarify that to users before changing the behavior of pdb_merge.

Nov 24 '22 10:11 joaomcteixeira

I was reviewing this now, and I still agree we should not touch pdb_merge for this purpose and pipe the results to pdb_tidy as the logic is not that straightforward. It is stated in the docs.

The contents are not sorted and no lines are deleted (e.g. END, TER
statements) so we recommend piping the results through `pdb_tidy.py`.

Apr 03 '23 10:04 joaomcteixeira

How will you explain users that pdb_merge does require pdb_tidy to make a correct PDB while pdb_mkensemble not...

Apr 03 '23 10:04 amjjbonvin

You are right. I am addressing this. It's not as straightforward as adding an END because also TER was not accounted for. If TERs exist in the output is because they were already present in the input. I am working on it.

Apr 03 '23 22:04 joaomcteixeira

When doing pdb_merge, should atom numbers be renumbered starting from 1? This would change the original atom numbers but avoid repeated numbers.

Apr 06 '23 08:04 joaomcteixeira

Probably good indeed

Apr 06 '23 09:04 amjjbonvin

Hi @amjjbonvin

I have addressed this issue, plus some other details. Now, pdb_merge operates as follows:

The merged PDB file will represent a single MODEL.
Comment lines in input PDBs will be ignored (REMARK, ...)
Atom numbers are restarted from 1.
CONECT lines are yield at the end. CONECT numbers are updated to the new atom numbers.
Missing TER and END statements are placed accordingly. Original TER and END statements are maintained.

Inside tests/data/ there are three PDBs, dummy_merge_A/B/C.pdb that you can use to test. Test also with others you may know, and let me know.

Apr 11 '23 14:04 joaomcteixeira