MolBERT
MolBERT copied to clipboard
Dataset size and creation
Hi, first of all congrats on your article and the NeurIPS workshop.
I have a few questions:
- Regarding fine-tuning: do you update the pre-trained encoder or do you freeze it ?
- You say that any molecule with a ECFP4 similarity higher than 0.323 to 10 drugs was discarded. I assume this was done for generalisation. However what type of similarity did you use (Tanimoto, Dice etc) and why 0.323 ? Also have you performed any clustering based on similarity for the final dataset to ensure that the parsed chemical space is balanced ?