PADME
PADME copied to clipboard
Refactoring the code
The current code is okay but some of the scripts are too complicated to understand, like splits/splitters.py
, metrics/__init__.py
, ./NCI60_data/preprocess.py
, often with large chunks of duplicated code. Some improvements are desired, especially in the splits/splitters.py
, currently it does not allow some parameter combinations and uses assert()
functions as a way to fail early. I will try to solve this problem in a more graceful manner.
Cleanups are also needed in some files.
Thresholding the continuous predictions to yield binary outputs is currently done in a hard-coded manner, which could be prone to errors. Will need to refactor it if necessary. Also the range estimation is implemented in DeepChem
using Bayesian statistics, possibly I need to incorporate this into the code as well.
Now the code is much more modularized, though there are some remaining problems in some of the scripts, which I was a bit lazy to fix and simply chose a "quick and dirty" solution. Need to fix it.
Since DeepChem is actively maintained but this repository might not be so, I need to decouple the two repos, such that no imports from DeepChem
would be necessary for the repo to function correctly, i.e., make it self-contained without dependence on DeepChem
.