benchmarksets Provide compound ID for all files

I think we should probably move towards a model where all ligands (or guests) in each benchmark set have an appropriate, unique, paper-specific numerical compound ID, rather than the current model where this is dependent on what set we're looking at. For example:

CB7 Tables 1&2: Has unique CID we assigned
GDCC Tables 3: Has unique CID we assigned, but will get broken if we want to provide structures docked into hosts as there are two hosts but only one set of compound IDs
GDCC Table 4: Has unique CID we assigned
CD Table 5 and 6: Has unique CID we assigned
lysozyme Tables 7 and 8: No CIDs, uses compound names only
BRD4(1) Table 9: Uses heterogeneous identifiers -- "Compound 4", "alprazolam", "Bzt-7", "JQ1(+)" etc.; this is probably the worst offender since some of these are pretty unsuitable as filenames due to special characters and/or spaces (e.g. some tools can't load files with spaces in their filenames and/or handle some of these special characters).

@GHeinzelmann @nhenriksen - thoughts? My preference I think is to make sure every set has a unique numerical compound ID in the tables and that this is used for all of the relevant files.

Aug 25 '17 14:08 davidlmobley

That sounds good, and it can be done quickly I think. I'll change the ligands names to a provided ID (from 1 -10), and change the associated tables in the paper and in the README file.

Aug 25 '17 15:08 GHeinzelmann

Working in the BRD4(1) benchmarks table in the main paper, and I won't fit in the page if I keep the ligand names but also add an extra ligand ID column (as done in the CD tables). Should I drop the ligands names altogether? They might not be essential since we are also providing the references.

Aug 25 '17 18:08 GHeinzelmann

I'm all for dropping the ligand names, or if you really want to keep track of them, put them in footnotes or in a separate markdown file you link to.

Aug 25 '17 18:08 davidlmobley

No we can drop them, I only gave the ligands names so the table would look the same as the Lysozyme one. I'll just give a number for each, which will also make the table look better (it was a little decentralized before since it was too wide). Then I'll change the README table and the ligand files names.

Aug 25 '17 18:08 GHeinzelmann

Resolved for bromodomains in #48 ; still needs to be done for lysozyme.

Aug 28 '17 17:08 davidlmobley