ludwig icon indicating copy to clipboard operation
ludwig copied to clipboard

Question: Frequently predict same samples during training

Open MarselScheer opened this issue 1 year ago • 3 comments

Describe the use case I want to monitor a few samples by checking how their predictions are changing during the training process, for instance at the end of each epoch.

Describe the solution you'd like I would like to leverage Callbacks for that. One could save the dataframe with the samples in the Callback-instance and then during on_epoch_end() one would calculate their predictions.

I did not managed to get it working. However, I have already found the Predictor-class and the PandasDataset(Manager)-class but I have somehow the feeling that I start using classes that are not meant to be used by end users like me.

MarselScheer avatar Oct 09 '23 09:10 MarselScheer

Hi @MarselScheer,

Ludwig uses the ProgressTracker to store metrics and other artifacts during evaluation during training.

The progress tracker is provided to the on_eval_end() and on_epoch_end() callbacks.

Some ideas for how to proceed:

  • during eval, add a small sample of predictions to ProgressTracker, which is already passed to callbacks
  • export a sample of predictions from eval and pipe them through to callbacks, alongside ProgressTracker.

justinxzhao avatar Oct 10 '23 17:10 justinxzhao

Sorry, but I dont get it. I try to summarize what you propose as a pseudo code

# The progress tracker is provided to the on_eval_end() and on_epoch_end() callbacks.
class MyCallback:

    def on_eval_end(..., progress_tracker):
        # during eval, add a small sample of predictions to ProgressTracker, which is already passed to callbacks
        progress_tracker.my_sample_preds.append(sample_preds)

    def on_epoch_end(..., progress_tracker):
        # export a sample of predictions from eval and pipe them through to callbacks, alongside ProgressTracker.
        self.my_export_of_sample_preds(progress_tracker.my_sample_preds)   

My main problem here is that I have no idea how to calculate sample_preds while i am in the callback. I have already recognized that there some classes that might help but at the moment i dont clearly see how to use them properly. Here is the naive way of how i would like to solve it

class MyCallBack:
    def __init__(self, sample_df):
        self.df = sample_df

    def on_xyz(self, trainer, progress_tracker, ...):
        preds = trainer.model.predict(self.df)
        self.save(progress_tracker.steps, self.df, preds) # might store on disk or send to wandb or other service

The problem is that trainer.model is not LudwigModel. I see that the trainer itself is using the Predictor class to do the evaluation during training https://github.com/ludwig-ai/ludwig/blob/6d74d21d71ecc7c125598a93f600b95aa35e2811/ludwig/trainers/trainer.py#L1252-L1259 So if I want to mimic that I somehow need to convert my polars/pandas-dataframe sample_df into a dataset, probably using https://github.com/ludwig-ai/ludwig/blob/6d74d21d71ecc7c125598a93f600b95aa35e2811/ludwig/data/dataset/pandas.py#L117-L117 At this point i have the feeling that I am not doing it correctly, going to deep into how things are internally done and start using classes that are not meant to be used by end users like me.

MarselScheer avatar Oct 12 '23 04:10 MarselScheer

I investigated the codebase a little bit more and here is a callback that at least does not throw an error anymore

from ludwig.data.preprocessing import preprocess_for_prediction
from ludwig.models.predictor import Predictor

class MyCallback(Callback):
    def __init__(self, df):
        self.df = df

    def on_preprocess_end(
        self,
        training_set,
        validation_set,
        test_set,
        training_set_metadata,
    ):
        self.training_set_metadata = training_set_metadata

    def on_train_start(self, model, config, config_fp):
        self.ds = preprocess_for_prediction(
            config=config,
            dataset=self.df,
            training_set_metadata=self.training_set_metadata,
        )[0]

    def on_epoch_end(self, trainer, progress_tracker, save_path):
        predictor = Predictor(
            trainer.dist_model,
            batch_size=128,
            distributed=trainer.distributed,
            report_tqdm_to_ray=False,
            model=trainer.model,
        )
        print(progress_tracker.steps, predictor.batch_predict(self.ds))

It more or less looks reasonable to me, though I dont know if also the postprocessing was already applied to those predictions. However, I still have the feeling that I am messing around with the innerworkings of ludwig by using Perdictor,() preprocess_for_prediction(), which does not feel right.

Hopefully, this concret implementation makes it easier to guide me into the right direction :-)

MarselScheer avatar Oct 12 '23 06:10 MarselScheer