ludwig icon indicating copy to clipboard operation
ludwig copied to clipboard

Not uploading confusion_matrix (and others) figure to Comet ML

Open CostaFernando opened this issue 1 year ago • 4 comments

Describe the bug Hi, guys!

I did an experiment and I tried to send the confusion_matrix visualization to Comet ML, following Third-Party Integrations section in Ludwig's documentation (https://ludwig.ai/latest/user_guide/integrations/#comet-ml), but I could not manage to send the figure to Comet ML.

I tested a few times with different configurations and I'm able to upload the "learning_curves" visualizations to Comet ML, but not other visualizations like: confusion_matrix, roc_curves_from_test_statistics and precision_recall_curves_from_test_statistics.

When I run ludwig visualize --comet --visualization learning_curves --training_statistics ./results/experiment_run/training_statistics.json, it correctly uploads the generated figures and the following output appears:

COMET INFO: Couldn't find a Git repository in '/home/ec2-user/ludwig-ai-playground' nor in any parent directory. You can override where Comet is looking for a Git Patch by setting the configuration `COMET_GIT_DIRECTORY`
COMET INFO: Experiment is live on comet.com https://my_comet_ml_experiment_uri

COMET WARNING: Experiment.set_code(code=...) is deprecated, use Experiment.log_code(code=..., code_name=...) instead
COMET ERROR: '/home/ec2-user/ludwig-ai-playground/.comet.config' exists and force is not True; refusing to overwrite
COMET INFO: ---------------------------------------------------------------------------------------
COMET INFO: Comet.ml ExistingExperiment Summary
COMET INFO: ---------------------------------------------------------------------------------------
COMET INFO:   Data:
COMET INFO:     display_summary_level : 1
COMET INFO:     url                   : https://my_comet_ml_experiment_uri
COMET INFO:   Uploads:
COMET INFO:     figures     : 3
COMET INFO:     filename    : 1
COMET INFO:     html        : 1
COMET INFO:     source_code : 1
COMET INFO: 
COMET INFO: Uploading metrics, params, and assets to Comet before program termination (may take several seconds)
COMET INFO: The Python SDK has 3600 seconds to finish uploading collected data
COMET INFO: Waiting for completion of the file uploads (may take several seconds)
COMET INFO: The Python SDK has 10800 seconds to finish uploading collected data
COMET INFO: All files uploaded, waiting for confirmation they have been all received

But when I run ludwig visualize --comet --visualization confusion_matrix --ground_truth_metadata ./results/experiment_run/model/training_set_metadata.json --test_statistics ./results/experiment_run/test_statistics.json --top_n_classes 2, it doesn't upload any figure. The following output appears:

COMET INFO: Couldn't find a Git repository in '/home/ec2-user/ludwig-ai-playground' nor in any parent directory. You can override where Comet is looking for a Git Patch by setting the configuration `COMET_GIT_DIRECTORY`
COMET INFO: Experiment is live on comet.com https://my_comet_ml_experiment_uri

COMET WARNING: Experiment.set_code(code=...) is deprecated, use Experiment.log_code(code=..., code_name=...) instead
COMET ERROR: '/home/ec2-user/ludwig-ai-playground/.comet.config' exists and force is not True; refusing to overwrite
/home/ec2-user/ludwig-ai-playground/venv/lib64/python3.9/site-packages/ludwig/utils/visualization_utils.py:1167: UserWarning: FixedFormatter should only be used together with FixedLocator
  ax.set_xticklabels([""] + labels, rotation=45, ha="left")
/home/ec2-user/ludwig-ai-playground/venv/lib64/python3.9/site-packages/ludwig/utils/visualization_utils.py:1168: UserWarning: FixedFormatter should only be used together with FixedLocator
  ax.set_yticklabels([""] + labels, rotation=45, ha="right")
COMET INFO: ---------------------------------------------------------------------------------------
COMET INFO: Comet.ml ExistingExperiment Summary
COMET INFO: ---------------------------------------------------------------------------------------
COMET INFO:   Data:
COMET INFO:     display_summary_level : 1
COMET INFO:     url                   : https://my_comet_ml_experiment_uri
COMET INFO:   Uploads:
COMET INFO:     filename    : 1
COMET INFO:     html        : 1
COMET INFO:     source_code : 1
COMET INFO: 
COMET INFO: Uploading metrics, params, and assets to Comet before program termination (may take several seconds)
COMET INFO: The Python SDK has 3600 seconds to finish uploading collected data

To Reproduce Steps to reproduce the behavior: To generate the data, I followed the section "Getting started" in Ludwig's documentation (https://ludwig.ai/latest/getting_started). Also you have to have a Comet ML account, it's free (https://www.comet.com/site/).

  1. pip install ludwig[full] comet_ml
  2. wget https://ludwig.ai/latest/data/rotten_tomatoes.csv
  3. Created a rotten_tomatoes.yaml, following the section Training (https://ludwig.ai/latest/getting_started/train/).
  4. export COMET_API_KEY="..." and export COMET_PROJECT_NAME="...". (https://www.comet.com/docs/python-sdk/ludwig/#running-ludwig-with-comet).
  5. ludwig experiment --comet --config rotten_tomatoes.yaml --dataset rotten_tomatoes.csv
  6. ludwig visualize --comet --visualization learning_curves --training_statistics ./results/experiment_run/training_statistics.json. It uploads the generated figures.
  7. ludwig visualize --comet --visualization confusion_matrix --ground_truth_metadata ./results/experiment_run/model/training_set_metadata.json --test_statistics ./results/experiment_run/test_statistics.json --top_n_classes 2. It generates the figures, but doesn't upload to Comet ML experiment.

Expected behavior I've expected that the figures generated by ludwig visualize confusion_matrix were uploaded to my Comet ML experiment.

Environment I tested this with AWS EC2 Amazon Linux 2023 AMI, with m5.2xlarge instance and Python version Python 3.9.16. But I think this specific infrastructure is not so relevant.

CostaFernando avatar Jun 28 '23 15:06 CostaFernando

Hi @CostaFernando!

I haven’t tried uploading to visualizations via the comet ML 3rd party integration before, though taking a look at the logs you included, it looks like the write request is being issued, but not accepted because of this error.

COMET ERROR: '/home/ec2-user/ludwig-ai-playground/.comet.config' exists and force is not True; refusing to overwrite

Looking through our comet.py implementation and going through comet ML docs, I see this method, which confirms that overwrite=False by default.

Questions:

  1. Do you see the same error if you reverse the order of the commands?
ludwig visualize --comet --visualization confusion_matrix --ground_truth_metadata ./results/experiment_run/model/training_set_metadata.json --test_statistics ./results/experiment_run/test_statistics.json --top_n_classes 2

ludwig visualize --comet --visualization learning_curves --training_statistics ./results/experiment_run/training_statistics.json

Maybe the first log command always succeeds and the second doesn't.

  1. Do you see the same behavior if you specify --mlflow?

  2. Do you know where the ludwig-ai-playground name comes from? I wonder if this is also something that needs to be configured, perhaps with the environment variable also suggested in the logs.

COMET INFO: Couldn't find a Git repository in '/home/ec2-user/ludwig-ai-playground' nor in any parent directory. You can override where Comet is looking for a Git Patch by setting the configuration `COMET_GIT_DIRECTORY`

justinxzhao avatar Jun 28 '23 21:06 justinxzhao

Hi, @justinxzhao !

Thank you for your answer.

I think the error COMET ERROR: '/home/ec2-user/ludwig-ai-playground/.comet.config' exists and force is not True; refusing to overwrite is not the root cause, because this error is happening in both cases, with learning_curves and confusion_matrix visualizations. This error is happening because in Ludwig's comet.py, line 124 self._save_config(config), you are trying to write a .comet.config every time, but it's already created in this case.

  1. Do you see the same error if you reverse the order of the commands? Changing the commands order doesn't have any distinct effect. I began only trying with confusion_matrix, I only used learning_curves after, for debugging purposes.

  2. Do you see the same behavior if you specify --mlflow? I don't use MLflow right now, so I didn't use this flag. I can test it out, does it work with Comet ML?

  3. Do you know where the ludwig-ai-playground name comes from? I wonder if this is also something that needs to be configured, perhaps with the environment variable also suggested in the logs. This was just a folder that I've created in the EC2 instance to reproduce the scenario in a simpler setup to post here.

CostaFernando avatar Jun 29 '23 21:06 CostaFernando

Hi @CostaFernando, thank you for reporting this issue. I work at Comet as the Integration Product Manager and I was able to reproduce your issue.

I started to take a look at the code and it looks like that logging the learning curves through integrations is supported by Ludwig, here in the code for learning curves: https://github.com/ludwig-ai/ludwig/blob/master/ludwig/visualize.py#L1390-L1392. But I don't see the callbacks being used in the confusion matrix visualization code https://github.com/ludwig-ai/ludwig/blob/master/ludwig/visualize.py#L3679.

I tried modifying the code to pass the callbacks to the visualization utils functions and I got the figures logged in Comet. Here is the patch:

diff --git a/ludwig/visualize.py b/ludwig/visualize.py
index 4d5cea1e..4597b4e0 100644
--- a/ludwig/visualize.py
+++ b/ludwig/visualize.py
@@ -3685,6 +3685,7 @@ def confusion_matrix(
     model_names: Union[str, List[str]] = None,
     output_directory: str = None,
     file_format: str = "pdf",
+    callbacks: List[Callback] = None,
     **kwargs,
 ) -> None:
     """Show confusion matrix in the models predictions for each `output_feature_name`.
@@ -3758,7 +3759,7 @@ def confusion_matrix(
                         filename = filename_template_path.format(model_name_name, output_feature_name, "top" + str(k))
 
                     visualization_utils.confusion_matrix_plot(
-                        cm, labels[:k], output_feature_name=output_feature_name, filename=filename
+                        cm, labels[:k], output_feature_name=output_feature_name, filename=filename, callbacks=callbacks
                     )
 
                     entropies = []
@@ -3783,6 +3784,7 @@ def confusion_matrix(
                         labels=[labels[i] for i in class_desc_entropy],
                         title="Classes ranked by entropy of " "Confusion Matrix row",
                         filename=filename,
+                        callbacks=callbacks
                     )
     if not confusion_matrix_found:
         logger.error("Cannot find confusion_matrix in evaluation data")

Maybe there is a reason why only the learning curves visualization are connected to the callback system, @justinxzhao do you know if there is a reason? And if not, could we connect all visualization to the callback system?

Lothiraldan avatar Jul 04 '23 16:07 Lothiraldan

Thanks for the responses @CostaFernando and thank you @Lothiraldan for jumping in and finding what looks almost certainly to be the root cause. This makes sense because MLFlow logging is integrated via callbacks.

@w4nderlust or @jimthompson5802 do you know why only the learning curves are connected to the callback system and not other visualizations?

I don't see any issues with adding callbacks for all of the visualization functions. Since you found the culprit @Lothiraldan, is this a change/PR that you would have cycles to put together? I'm happy to be the reviewer.

justinxzhao avatar Jul 05 '23 22:07 justinxzhao