ludwig
ludwig copied to clipboard
Not uploading confusion_matrix (and others) figure to Comet ML
Describe the bug Hi, guys!
I did an experiment and I tried to send the confusion_matrix visualization to Comet ML, following Third-Party Integrations section in Ludwig's documentation (https://ludwig.ai/latest/user_guide/integrations/#comet-ml), but I could not manage to send the figure to Comet ML.
I tested a few times with different configurations and I'm able to upload the "learning_curves" visualizations to Comet ML, but not other visualizations like: confusion_matrix, roc_curves_from_test_statistics and precision_recall_curves_from_test_statistics.
When I run ludwig visualize --comet --visualization learning_curves --training_statistics ./results/experiment_run/training_statistics.json
, it correctly uploads the generated figures and the following output appears:
COMET INFO: Couldn't find a Git repository in '/home/ec2-user/ludwig-ai-playground' nor in any parent directory. You can override where Comet is looking for a Git Patch by setting the configuration `COMET_GIT_DIRECTORY`
COMET INFO: Experiment is live on comet.com https://my_comet_ml_experiment_uri
COMET WARNING: Experiment.set_code(code=...) is deprecated, use Experiment.log_code(code=..., code_name=...) instead
COMET ERROR: '/home/ec2-user/ludwig-ai-playground/.comet.config' exists and force is not True; refusing to overwrite
COMET INFO: ---------------------------------------------------------------------------------------
COMET INFO: Comet.ml ExistingExperiment Summary
COMET INFO: ---------------------------------------------------------------------------------------
COMET INFO: Data:
COMET INFO: display_summary_level : 1
COMET INFO: url : https://my_comet_ml_experiment_uri
COMET INFO: Uploads:
COMET INFO: figures : 3
COMET INFO: filename : 1
COMET INFO: html : 1
COMET INFO: source_code : 1
COMET INFO:
COMET INFO: Uploading metrics, params, and assets to Comet before program termination (may take several seconds)
COMET INFO: The Python SDK has 3600 seconds to finish uploading collected data
COMET INFO: Waiting for completion of the file uploads (may take several seconds)
COMET INFO: The Python SDK has 10800 seconds to finish uploading collected data
COMET INFO: All files uploaded, waiting for confirmation they have been all received
But when I run ludwig visualize --comet --visualization confusion_matrix --ground_truth_metadata ./results/experiment_run/model/training_set_metadata.json --test_statistics ./results/experiment_run/test_statistics.json --top_n_classes 2
, it doesn't upload any figure. The following output appears:
COMET INFO: Couldn't find a Git repository in '/home/ec2-user/ludwig-ai-playground' nor in any parent directory. You can override where Comet is looking for a Git Patch by setting the configuration `COMET_GIT_DIRECTORY`
COMET INFO: Experiment is live on comet.com https://my_comet_ml_experiment_uri
COMET WARNING: Experiment.set_code(code=...) is deprecated, use Experiment.log_code(code=..., code_name=...) instead
COMET ERROR: '/home/ec2-user/ludwig-ai-playground/.comet.config' exists and force is not True; refusing to overwrite
/home/ec2-user/ludwig-ai-playground/venv/lib64/python3.9/site-packages/ludwig/utils/visualization_utils.py:1167: UserWarning: FixedFormatter should only be used together with FixedLocator
ax.set_xticklabels([""] + labels, rotation=45, ha="left")
/home/ec2-user/ludwig-ai-playground/venv/lib64/python3.9/site-packages/ludwig/utils/visualization_utils.py:1168: UserWarning: FixedFormatter should only be used together with FixedLocator
ax.set_yticklabels([""] + labels, rotation=45, ha="right")
COMET INFO: ---------------------------------------------------------------------------------------
COMET INFO: Comet.ml ExistingExperiment Summary
COMET INFO: ---------------------------------------------------------------------------------------
COMET INFO: Data:
COMET INFO: display_summary_level : 1
COMET INFO: url : https://my_comet_ml_experiment_uri
COMET INFO: Uploads:
COMET INFO: filename : 1
COMET INFO: html : 1
COMET INFO: source_code : 1
COMET INFO:
COMET INFO: Uploading metrics, params, and assets to Comet before program termination (may take several seconds)
COMET INFO: The Python SDK has 3600 seconds to finish uploading collected data
To Reproduce Steps to reproduce the behavior: To generate the data, I followed the section "Getting started" in Ludwig's documentation (https://ludwig.ai/latest/getting_started). Also you have to have a Comet ML account, it's free (https://www.comet.com/site/).
-
pip install ludwig[full] comet_ml
-
wget https://ludwig.ai/latest/data/rotten_tomatoes.csv
- Created a rotten_tomatoes.yaml, following the section Training (https://ludwig.ai/latest/getting_started/train/).
- export COMET_API_KEY="..." and export COMET_PROJECT_NAME="...". (https://www.comet.com/docs/python-sdk/ludwig/#running-ludwig-with-comet).
-
ludwig experiment --comet --config rotten_tomatoes.yaml --dataset rotten_tomatoes.csv
-
ludwig visualize --comet --visualization learning_curves --training_statistics ./results/experiment_run/training_statistics.json
. It uploads the generated figures. -
ludwig visualize --comet --visualization confusion_matrix --ground_truth_metadata ./results/experiment_run/model/training_set_metadata.json --test_statistics ./results/experiment_run/test_statistics.json --top_n_classes 2
. It generates the figures, but doesn't upload to Comet ML experiment.
Expected behavior I've expected that the figures generated by ludwig visualize confusion_matrix were uploaded to my Comet ML experiment.
Environment I tested this with AWS EC2 Amazon Linux 2023 AMI, with m5.2xlarge instance and Python version Python 3.9.16. But I think this specific infrastructure is not so relevant.
Hi @CostaFernando!
I haven’t tried uploading to visualizations via the comet ML 3rd party integration before, though taking a look at the logs you included, it looks like the write request is being issued, but not accepted because of this error.
COMET ERROR: '/home/ec2-user/ludwig-ai-playground/.comet.config' exists and force is not True; refusing to overwrite
Looking through our comet.py
implementation and going through comet ML docs, I see this method, which confirms that overwrite=False
by default.
Questions:
- Do you see the same error if you reverse the order of the commands?
ludwig visualize --comet --visualization confusion_matrix --ground_truth_metadata ./results/experiment_run/model/training_set_metadata.json --test_statistics ./results/experiment_run/test_statistics.json --top_n_classes 2
ludwig visualize --comet --visualization learning_curves --training_statistics ./results/experiment_run/training_statistics.json
Maybe the first log command always succeeds and the second doesn't.
-
Do you see the same behavior if you specify
--mlflow
? -
Do you know where the
ludwig-ai-playground
name comes from? I wonder if this is also something that needs to be configured, perhaps with the environment variable also suggested in the logs.
COMET INFO: Couldn't find a Git repository in '/home/ec2-user/ludwig-ai-playground' nor in any parent directory. You can override where Comet is looking for a Git Patch by setting the configuration `COMET_GIT_DIRECTORY`
Hi, @justinxzhao !
Thank you for your answer.
I think the error COMET ERROR: '/home/ec2-user/ludwig-ai-playground/.comet.config' exists and force is not True; refusing to overwrite
is not the root cause, because this error is happening in both cases, with learning_curves and confusion_matrix visualizations. This error is happening because in Ludwig's comet.py
, line 124 self._save_config(config)
, you are trying to write a .comet.config every time, but it's already created in this case.
-
Do you see the same error if you reverse the order of the commands? Changing the commands order doesn't have any distinct effect. I began only trying with confusion_matrix, I only used learning_curves after, for debugging purposes.
-
Do you see the same behavior if you specify --mlflow? I don't use MLflow right now, so I didn't use this flag. I can test it out, does it work with Comet ML?
-
Do you know where the ludwig-ai-playground name comes from? I wonder if this is also something that needs to be configured, perhaps with the environment variable also suggested in the logs. This was just a folder that I've created in the EC2 instance to reproduce the scenario in a simpler setup to post here.
Hi @CostaFernando, thank you for reporting this issue. I work at Comet as the Integration Product Manager and I was able to reproduce your issue.
I started to take a look at the code and it looks like that logging the learning curves through integrations is supported by Ludwig, here in the code for learning curves: https://github.com/ludwig-ai/ludwig/blob/master/ludwig/visualize.py#L1390-L1392. But I don't see the callbacks being used in the confusion matrix visualization code https://github.com/ludwig-ai/ludwig/blob/master/ludwig/visualize.py#L3679.
I tried modifying the code to pass the callbacks to the visualization utils functions and I got the figures logged in Comet. Here is the patch:
diff --git a/ludwig/visualize.py b/ludwig/visualize.py
index 4d5cea1e..4597b4e0 100644
--- a/ludwig/visualize.py
+++ b/ludwig/visualize.py
@@ -3685,6 +3685,7 @@ def confusion_matrix(
model_names: Union[str, List[str]] = None,
output_directory: str = None,
file_format: str = "pdf",
+ callbacks: List[Callback] = None,
**kwargs,
) -> None:
"""Show confusion matrix in the models predictions for each `output_feature_name`.
@@ -3758,7 +3759,7 @@ def confusion_matrix(
filename = filename_template_path.format(model_name_name, output_feature_name, "top" + str(k))
visualization_utils.confusion_matrix_plot(
- cm, labels[:k], output_feature_name=output_feature_name, filename=filename
+ cm, labels[:k], output_feature_name=output_feature_name, filename=filename, callbacks=callbacks
)
entropies = []
@@ -3783,6 +3784,7 @@ def confusion_matrix(
labels=[labels[i] for i in class_desc_entropy],
title="Classes ranked by entropy of " "Confusion Matrix row",
filename=filename,
+ callbacks=callbacks
)
if not confusion_matrix_found:
logger.error("Cannot find confusion_matrix in evaluation data")
Maybe there is a reason why only the learning curves visualization are connected to the callback system, @justinxzhao do you know if there is a reason? And if not, could we connect all visualization to the callback system?
Thanks for the responses @CostaFernando and thank you @Lothiraldan for jumping in and finding what looks almost certainly to be the root cause. This makes sense because MLFlow logging is integrated via callbacks.
@w4nderlust or @jimthompson5802 do you know why only the learning curves are connected to the callback system and not other visualizations?
I don't see any issues with adding callbacks for all of the visualization functions. Since you found the culprit @Lothiraldan, is this a change/PR that you would have cycles to put together? I'm happy to be the reviewer.