benchmarksets icon indicating copy to clipboard operation
benchmarksets copied to clipboard

Provide compound ID for all files

Open davidlmobley opened this issue 8 years ago • 5 comments

I think we should probably move towards a model where all ligands (or guests) in each benchmark set have an appropriate, unique, paper-specific numerical compound ID, rather than the current model where this is dependent on what set we're looking at. For example:

  • CB7 Tables 1&2: Has unique CID we assigned
  • GDCC Tables 3: Has unique CID we assigned, but will get broken if we want to provide structures docked into hosts as there are two hosts but only one set of compound IDs
  • GDCC Table 4: Has unique CID we assigned
  • CD Table 5 and 6: Has unique CID we assigned
  • lysozyme Tables 7 and 8: No CIDs, uses compound names only
  • BRD4(1) Table 9: Uses heterogeneous identifiers -- "Compound 4", "alprazolam", "Bzt-7", "JQ1(+)" etc.; this is probably the worst offender since some of these are pretty unsuitable as filenames due to special characters and/or spaces (e.g. some tools can't load files with spaces in their filenames and/or handle some of these special characters).

@GHeinzelmann @nhenriksen - thoughts? My preference I think is to make sure every set has a unique numerical compound ID in the tables and that this is used for all of the relevant files.

davidlmobley avatar Aug 25 '17 14:08 davidlmobley

That sounds good, and it can be done quickly I think. I'll change the ligands names to a provided ID (from 1 -10), and change the associated tables in the paper and in the README file.

GHeinzelmann avatar Aug 25 '17 15:08 GHeinzelmann

Working in the BRD4(1) benchmarks table in the main paper, and I won't fit in the page if I keep the ligand names but also add an extra ligand ID column (as done in the CD tables). Should I drop the ligands names altogether? They might not be essential since we are also providing the references.

GHeinzelmann avatar Aug 25 '17 18:08 GHeinzelmann

I'm all for dropping the ligand names, or if you really want to keep track of them, put them in footnotes or in a separate markdown file you link to.

davidlmobley avatar Aug 25 '17 18:08 davidlmobley

No we can drop them, I only gave the ligands names so the table would look the same as the Lysozyme one. I'll just give a number for each, which will also make the table look better (it was a little decentralized before since it was too wide). Then I'll change the README table and the ligand files names.

GHeinzelmann avatar Aug 25 '17 18:08 GHeinzelmann

Resolved for bromodomains in #48 ; still needs to be done for lysozyme.

davidlmobley avatar Aug 28 '17 17:08 davidlmobley