Regression Task with Biological Activity Data Using Pretrained Chemformer
Dear Chemformer Team,
I am currently embarking on a project aiming to perform regression analysis using biological activity data (specifically, pXC50 values) with the pretrained Chemformer model. The objective is to predict activity values based on SMILES strings.
In the process of setting up my environment and preparing for fine-tuning, I encountered a closed issue https://github.com/MolecularAI/Chemformer/issues/13 and a fork of the repository, which provided clear examples and scripts for fine-tuning Chemformer on regression tasks. Notably, these resources referenced RegPropDataModule(_AbsDataModule) in finetune_regression_modules.py, suggesting it as a viable option for regression with Chemformer.
However, upon revisiting the Chemformer repository, it appears that the finetune_regression directory and RegPropDataModule class are no longer present in the example_scripts folder, which has left me uncertain about the best approach to undertake my regression task with the latest codebase.
With the above context, I am reaching out to seek your guidance on several points:
-
Current Recommended DataModule: Given the removal of RegPropDataModule and associated fine-tuning examples, could you advise on which DataModule in the current code structure is best suited for handling a dataset of SMILES strings with pXC50 values for regression analysis?
-
Script Selection: Among the scripts present in the repository (e.g., fine_tune.py, inference_score.py, predict.py), which would you recommend for fine-tuning the pretrained model on a regression dataset and for making subsequent predictions?
-
Further Recommendations: If there are any specific recommendations regarding data preprocessing, hyperparameter selection, or other considerations to optimize the use of Chemformer for this regression task, I would be grateful for your insights.
Thank you very much for your time and support !
Hi, As you noticed, we have removed support for regression in the newer Chemformer version. However, you can have a look at this old release: https://github.com/MolecularAI/Chemformer/releases/tag/1.0
It may contain the scripts and datamodules you refer to.
Thank you for your response @anniewesterlund, may I ask why the new version removed the support for regression?