Provide compound ID for all files
I think we should probably move towards a model where all ligands (or guests) in each benchmark set have an appropriate, unique, paper-specific numerical compound ID, rather than the current model where this is dependent on what set we're looking at. For example:
- CB7 Tables 1&2: Has unique CID we assigned
- GDCC Tables 3: Has unique CID we assigned, but will get broken if we want to provide structures docked into hosts as there are two hosts but only one set of compound IDs
- GDCC Table 4: Has unique CID we assigned
- CD Table 5 and 6: Has unique CID we assigned
- lysozyme Tables 7 and 8: No CIDs, uses compound names only
- BRD4(1) Table 9: Uses heterogeneous identifiers -- "Compound 4", "alprazolam", "Bzt-7", "JQ1(+)" etc.; this is probably the worst offender since some of these are pretty unsuitable as filenames due to special characters and/or spaces (e.g. some tools can't load files with spaces in their filenames and/or handle some of these special characters).
@GHeinzelmann @nhenriksen - thoughts? My preference I think is to make sure every set has a unique numerical compound ID in the tables and that this is used for all of the relevant files.
That sounds good, and it can be done quickly I think. I'll change the ligands names to a provided ID (from 1 -10), and change the associated tables in the paper and in the README file.
Working in the BRD4(1) benchmarks table in the main paper, and I won't fit in the page if I keep the ligand names but also add an extra ligand ID column (as done in the CD tables). Should I drop the ligands names altogether? They might not be essential since we are also providing the references.
I'm all for dropping the ligand names, or if you really want to keep track of them, put them in footnotes or in a separate markdown file you link to.
No we can drop them, I only gave the ligands names so the table would look the same as the Lysozyme one. I'll just give a number for each, which will also make the table look better (it was a little decentralized before since it was too wide). Then I'll change the README table and the ligand files names.
Resolved for bromodomains in #48 ; still needs to be done for lysozyme.