evidently
evidently copied to clipboard
problem using TextDescriptorsDriftMetric
Hi,
I'm applying to some metrics for my dataset which has text and numerical features. TextDescriptorsDriftMetric is applied to the column content but I get an error saying "ValueError: No objects to concatenate" when save_hmtl is done.
report = Report(
metrics=[
ColumnDriftMetric(column_name="content"),
ColumnDriftMetric(column_name="predictions"),
DatasetDriftMetric(),
DatasetMissingValuesMetric(),
TextDescriptorsDriftMetric(column_name="content"),
]
)
report.run(
reference_data=ref_df,
current_data=cur_df,
column_mapping=column_mapping,
)
report.save_html("monitor/aggregated.html")
I checked the reference and current datasets and they're equal and not empty. And column_mapping is defined as below. What could be the reason for this error?
from evidently import ColumnMapping
column_mapping = ColumnMapping()
column_mapping = ColumnMapping(
prediction="predictions",
text_features=["content", "summary"],
target=None,
)
Error output:
Traceback (most recent call last):
File "<string>", line 15, in <module>
File "/Users/tugcegunes/Library/Caches/pypoetry/virtualenvs/pov004-automatic-news-filter-uqh4bnaG-py3.9/lib/python3.9/site-packages/evidently/suite/base_suite.py", line 184, in save_html
dashboard_id, dashboard_info, graphs = self._build_dashboard_info()
File "/Users/tugcegunes/Library/Caches/pypoetry/virtualenvs/pov004-automatic-news-filter-uqh4bnaG-py3.9/lib/python3.9/site-packages/evidently/report/report.py", line 171, in _build_dashboard_info
html_info = renderer.render_html(test)
File "/Users/tugcegunes/Library/Caches/pypoetry/virtualenvs/pov004-automatic-news-filter-uqh4bnaG-py3.9/lib/python3.9/site-packages/evidently/metrics/data_drift/text_descriptors_drift_metric.py", line 210, in render_html
results = obj.get_result()
File "/Users/tugcegunes/Library/Caches/pypoetry/virtualenvs/pov004-automatic-news-filter-uqh4bnaG-py3.9/lib/python3.9/site-packages/evidently/base_metric.py", line 184, in get_result
raise result.exception
File "/Users/tugcegunes/Library/Caches/pypoetry/virtualenvs/pov004-automatic-news-filter-uqh4bnaG-py3.9/lib/python3.9/site-packages/evidently/suite/base_suite.py", line 393, in run_calculate
calculations[calculation] = calculation.calculate(data)
File "/Users/tugcegunes/Library/Caches/pypoetry/virtualenvs/pov004-automatic-news-filter-uqh4bnaG-py3.9/lib/python3.9/site-packages/evidently/metrics/data_drift/text_descriptors_drift_metric.py", line 105, in calculate
curr_text_df = pd.concat(
File "/Users/tugcegunes/Library/Caches/pypoetry/virtualenvs/pov004-automatic-news-filter-uqh4bnaG-py3.9/lib/python3.9/site-packages/pandas/util/_decorators.py", line 331, in wrapper
return func(*args, **kwargs)
File "/Users/tugcegunes/Library/Caches/pypoetry/virtualenvs/pov004-automatic-news-filter-uqh4bnaG-py3.9/lib/python3.9/site-packages/pandas/core/reshape/concat.py", line 368, in concat
op = _Concatenator(
File "/Users/tugcegunes/Library/Caches/pypoetry/virtualenvs/pov004-automatic-news-filter-uqh4bnaG-py3.9/lib/python3.9/site-packages/pandas/core/reshape/concat.py", line 425, in __init__
raise ValueError("No objects to concatenate")
ValueError: No objects to concatenate
Hi @tugcegns,
Thanks for reporting the bug!
Could you share:
- Which evidently version are you using?
- Are you able to show the report in the notebook? Or the error appears as well?
- Does your text data contain missing values in either of the two text feature columns?
Hi,
Thanks for quick response! I'm using evidently 0.3.3 and I've checked the data but no missing values. If I run it in jupyter notebook, I'm getting the same error.
This is error log:
File ~/opt/anaconda3/envs/llm_env/lib/python3.8/site-packages/evidently/suite/base_suite.py:393, in Suite.run_calculate(self, data)
391 logging.debug(f"Executing {type(calculation)}...")
392 try:
--> 393 calculations[calculation] = calculation.calculate(data)
394 except BaseException as ex:
395 calculations[calculation] = ErrorResult(ex)
File ~/opt/anaconda3/envs/llm_env/lib/python3.8/site-packages/evidently/metrics/data_drift/text_descriptors_drift_metric.py:105, in TextDescriptorsDriftMetric.calculate(self, data)
103 else:
104 agg_data = True
--> 105 curr_text_df = pd.concat(
106 [data.get_current_column(x.feature_name()) for x in list(self.generated_text_features.values())],
107 axis=1,
108 )
109 curr_text_df.columns = list(self.generated_text_features.keys())
111 ref_text_df = pd.concat(
112 [data.get_reference_column(x.feature_name()) for x in list(self.generated_text_features.values())],
113 axis=1,
114 )
File ~/opt/anaconda3/envs/llm_env/lib/python3.8/site-packages/pandas/core/reshape/concat.py:372, in concat(objs, axis, join, ignore_index, keys, levels, names, verify_integrity, sort, copy)
369 elif copy and using_copy_on_write():
370 copy = False
--> 372 op = _Concatenator(
373 objs,
374 axis=axis,
375 ignore_index=ignore_index,
376 join=join,
377 keys=keys,
378 levels=levels,
379 names=names,
380 verify_integrity=verify_integrity,
381 copy=copy,
382 sort=sort,
383 )
385 return op.get_result()
File ~/opt/anaconda3/envs/llm_env/lib/python3.8/site-packages/pandas/core/reshape/concat.py:429, in _Concatenator.__init__(self, objs, axis, join, keys, levels, names, ignore_index, verify_integrity, copy, sort)
426 objs = list(objs)
428 if len(objs) == 0:
--> 429 raise ValueError("No objects to concatenate")
431 if keys is None:
432 objs = list(com.not_none(*objs))
ValueError: No objects to concatenate
Thanks @tugcegns - the error trace is helpful! Appreciate you reporting the bug.
Let us dig a bit deeper to test what might be wrong. We cannot immediately reproduce it on our test datasets, but we'll run a few more tests to try to uncover.
Please update if you figure it the condition (e.g. specific parameters of the dataset) when it occurs.
Hi @tugcegns,
We ran on all our test datasets and unfortunately cannot reproduce the error.
-
Is there a chance you can share a small example dataset for which the error occurs? It can be a very small and obfuscated sample, or a totally toy example as long as you get the same error.
-
Also, are you able to generate a ColumnSummary metric for the text columns in your dataset?
summary_overview = Report(metrics=[
ColumnSummaryMetric(column_name="summary"),
])
summary_overview.run(reference_data=ref_df, current_data=cur_df, column_mapping=column_mapping)
summary_overview
If yes, could you share a screenshot to understand the shape and parameters of the data?
Hello @elenasamuylova ,
I prepared the example dataset with text features, here you can find it: https://github.com/tugcegns/evidently-monitoring/blob/main/example_dataset.csv
For the ColumnSummaryMetric, it throws an error which is ValueError: unknown feature type text
Could you let me know if anything is wrong with my dataset values? Thanks!
Hi @tugcegns,
Thanks for sharing! I reproduced your code on your example dataset in Colab. Here it is: https://colab.research.google.com/drive/1H4SMRHzWC_GFfqGAnhMTQGUtpeQ6Rlmc?usp=sharing
It appears to work correctly.
To double-check, did you import NLTK (required for some descriptors) in your code?
import nltk
nltk.download('words')
nltk.download('wordnet')
nltk.download('omw-1.4')
nltk.download('vader_lexicon')
If yes, and the error still persists, you share any other details about your environment? Are you running the code specifically in Jupyter notebook?
Hi @elenasamuylova,
Importing NLTK actually solved the problem! Thanks for sharing this colab file, it helped a lot! 👍
Glad it is solved @tugcegns!