practicalcheminformatics
practicalcheminformatics copied to clipboard
Assessing Interpretable Models | Practical Cheminformatics
Assessing Interpretable Models | Practical Cheminformatics
Understanding and comparing the rationale behind machine learning model predictions
Beautiful! I really enjoy this topic of interpretability of models. Could you comment on the problem (if there is one) of using a 1024-bit fingerprint to train a ML model with "not so many" molecules? I remember reading that your samples:features ratio should be at least 5:1, but it is hard to find 5000 molecules for a lot of specific QSAR tasks. By the way, there seems to be a small formatting problem with the formula after "Matveieva and Polishchuk define a topn score as".
A lot of the ideas behind the "5:1 rule" come from linear regression and aren't relevant to modern ML techniques like ensemble methods and neural nets, which have alternate methods for dealing with overfitting.