ludwig
ludwig copied to clipboard
Question: Frequently predict same samples during training
Describe the use case I want to monitor a few samples by checking how their predictions are changing during the training process, for instance at the end of each epoch.
Describe the solution you'd like I would like to leverage Callbacks for that. One could save the dataframe with the samples in the Callback-instance and then during on_epoch_end() one would calculate their predictions.
I did not managed to get it working. However, I have already found the Predictor-class and the PandasDataset(Manager)-class but I have somehow the feeling that I start using classes that are not meant to be used by end users like me.
Hi @MarselScheer,
Ludwig uses the ProgressTracker to store metrics and other artifacts during evaluation during training.
The progress tracker is provided to the on_eval_end()
and on_epoch_end()
callbacks.
Some ideas for how to proceed:
- during eval, add a small sample of predictions to
ProgressTracker
, which is already passed to callbacks - export a sample of predictions from eval and pipe them through to callbacks, alongside
ProgressTracker
.
Sorry, but I dont get it. I try to summarize what you propose as a pseudo code
# The progress tracker is provided to the on_eval_end() and on_epoch_end() callbacks.
class MyCallback:
def on_eval_end(..., progress_tracker):
# during eval, add a small sample of predictions to ProgressTracker, which is already passed to callbacks
progress_tracker.my_sample_preds.append(sample_preds)
def on_epoch_end(..., progress_tracker):
# export a sample of predictions from eval and pipe them through to callbacks, alongside ProgressTracker.
self.my_export_of_sample_preds(progress_tracker.my_sample_preds)
My main problem here is that I have no idea how to calculate sample_preds
while i am in the callback. I have already recognized that there some classes that might help but at the moment i dont clearly see how to use them properly. Here is the naive way of how i would like to solve it
class MyCallBack:
def __init__(self, sample_df):
self.df = sample_df
def on_xyz(self, trainer, progress_tracker, ...):
preds = trainer.model.predict(self.df)
self.save(progress_tracker.steps, self.df, preds) # might store on disk or send to wandb or other service
The problem is that trainer.model
is not LudwigModel. I see that the trainer itself is using the Predictor class to do the evaluation during training https://github.com/ludwig-ai/ludwig/blob/6d74d21d71ecc7c125598a93f600b95aa35e2811/ludwig/trainers/trainer.py#L1252-L1259
So if I want to mimic that I somehow need to convert my polars/pandas-dataframe sample_df
into a dataset, probably using https://github.com/ludwig-ai/ludwig/blob/6d74d21d71ecc7c125598a93f600b95aa35e2811/ludwig/data/dataset/pandas.py#L117-L117
At this point i have the feeling that I am not doing it correctly, going to deep into how things are internally done and start using classes that are not meant to be used by end users like me.
I investigated the codebase a little bit more and here is a callback that at least does not throw an error anymore
from ludwig.data.preprocessing import preprocess_for_prediction
from ludwig.models.predictor import Predictor
class MyCallback(Callback):
def __init__(self, df):
self.df = df
def on_preprocess_end(
self,
training_set,
validation_set,
test_set,
training_set_metadata,
):
self.training_set_metadata = training_set_metadata
def on_train_start(self, model, config, config_fp):
self.ds = preprocess_for_prediction(
config=config,
dataset=self.df,
training_set_metadata=self.training_set_metadata,
)[0]
def on_epoch_end(self, trainer, progress_tracker, save_path):
predictor = Predictor(
trainer.dist_model,
batch_size=128,
distributed=trainer.distributed,
report_tqdm_to_ray=False,
model=trainer.model,
)
print(progress_tracker.steps, predictor.batch_predict(self.ds))
It more or less looks reasonable to me, though I dont know if also the postprocessing was already applied to those predictions. However, I still have the feeling that I am messing around with the innerworkings of ludwig by using Perdictor,() preprocess_for_prediction(), which does not feel right.
Hopefully, this concret implementation makes it easier to guide me into the right direction :-)