ReinventCommunity icon indicating copy to clipboard operation
ReinventCommunity copied to clipboard

what data used for prior

Open rneeser opened this issue 1 year ago • 0 comments

Hi! Thanks for this awesome collection! I was wondering what dataset exactly was used to train the model random.prior.new. I assume this is the version using data from ChEMBL with randomized SMILES? What additional filters where there applied in terms of:

  • element types
  • number of heavy atoms
  • max number of e.g. rings with or without heteroatoms
  • deduplication based on stereochemistry

Thanks!

rneeser avatar Sep 23 '22 13:09 rneeser