Cardea icon indicating copy to clipboard operation
Cardea copied to clipboard

Compose migration

Open sarahmish opened this issue 3 years ago • 0 comments

Prediction Engineering

How to use compose to write the problem definition component in cardea.

Compose is a machine learning tool for automated prediction engineering. It allows you to structure prediction problems and generate labels for supervised learning. We can use compose to search for the cutoff times for a specific prediction problem (e.g. los) and return label_times.

The component should be easily adaptable to support multiple prediction problems:

  • appointment no show
  • mortality prediction
  • length of stay
  • etc

Design

There are two main parts that we need to define:

  • Class with main function of generating label times
  • Functions defining the prediction problem in mind
  • We also require helper functions to create the prediction problem

Design of data_laber.py

class DataLabeler:
    """Class that defines the prediction problem.

    This class supports the generation of `label_times` which 
    is fundamental to the feature generation phase as well 
    as specifying the target labels.

    Args:
        function (method):
            function that defines the labeling function, it should return a
            tuple of labeling function, the dataframe, and the name of the
            target entity.
    """
    def __init__(self, function):
        self.function = function

    def generate_label_times(self, es, *args, **kwargs):
        """Searches the data to calculate label times.

          Args:
              df (pandas.DataFrame): 
                  Data frame to search and extract labels.

          Returns:
              composeml.LabelTimes: 
                  Calculated labels with cutoff times.
        """
        pass

Design of a prediction function (e.g. appointment_no_show.py)

def appointment_no_show(es):
    def missed(ds, **kwargs):
        return True if 'noshow' in ds["status"].values else False

    meta = {
        # values to define prediction task
        "entity": "appointment",
        "time_index": "created",
        "type": "classification",
        "num_examples_per_instance": 1
    }

    df = denormalize(es, entities=['Appointment'])
    
    return missed, df, meta

sarahmish avatar Mar 30 '21 20:03 sarahmish