healthcareai-py icon indicating copy to clipboard operation
healthcareai-py copied to clipboard

Refactor Cleanup

Open Aylr opened this issue 7 years ago • 4 comments

Aylr avatar May 05 '17 05:05 Aylr

I think one problem I see in the code is at: healthcareai py/healthcareai/develop_supervised_model.py.

When you have a lot of supervised models for this class: DevelopSupervisedModel

This class eventually get very long. I think one way we can solve this is to move the linear, and random forests into separate files.

Then in the new phase of the class (or use a metaclass), we would programmatically import the functions in a specified folder, and since the functions are unbounded when they are first imported, we can change them to a bounded function on the fly and add the function to the class itself.

mxlei01 avatar Jul 27 '17 06:07 mxlei01

There are many boiler plate code scattered around the repository, for example:

def validate_regression(self, model_name=None):
    """
    Raises error if not a regression trainer.
        
    Args:
        model_name (str): Nice string for error messages
    """
    if not self.is_regression:
        raise HealthcareAIError('A {} model can only be trained with a regression trainer.'.format(model_name))

Developers would need to call this function every time when they want to add an algorithm.

Instead, if we made this a decorator, or even better, apply the decorator to all of the function when this class is initiated. We can apply this decorator to function names which ends in classification, regression, regressor.

I think this way is also safer because someone might forget to add this test.

mxlei01 avatar Jul 29 '17 02:07 mxlei01

Some reinventing the wheel, for example

class DataFrameImputer(TransformerMixin):

sklearn already has it's own imputer: http://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.Imputer.html

It also has different types of impute such as most occurrence and median value.

mxlei01 avatar Jul 30 '17 06:07 mxlei01

@mxlei01 Not sure how I missed all these great ideas. I really like the decorator validator idea. Would you like to create an issue and start it?

Regarding the imputer, yes we are recreating some scikit functionality, however, scikit returns numpy arrays, and currently most of healthcareai passes around dataframes until they are ultimately handed to scikit.

Aylr avatar Sep 18 '17 12:09 Aylr