evidently problem using TextDescriptorsDriftMetric

Hi,

I'm applying to some metrics for my dataset which has text and numerical features. TextDescriptorsDriftMetric is applied to the column content but I get an error saying "ValueError: No objects to concatenate" when save_hmtl is done.

  report = Report(
      metrics=[
          ColumnDriftMetric(column_name="content"),
          ColumnDriftMetric(column_name="predictions"), 
          DatasetDriftMetric(),
          DatasetMissingValuesMetric(),
          TextDescriptorsDriftMetric(column_name="content"),
      ]
  )
  report.run(
      reference_data=ref_df,
      current_data=cur_df,
      column_mapping=column_mapping,
  )

  report.save_html("monitor/aggregated.html")

I checked the reference and current datasets and they're equal and not empty. And column_mapping is defined as below. What could be the reason for this error?

from evidently import ColumnMapping

column_mapping = ColumnMapping()
column_mapping = ColumnMapping(
    prediction="predictions",
    text_features=["content", "summary"],
    target=None,
)

Error output:

Traceback (most recent call last):
  File "<string>", line 15, in <module>
  File "/Users/tugcegunes/Library/Caches/pypoetry/virtualenvs/pov004-automatic-news-filter-uqh4bnaG-py3.9/lib/python3.9/site-packages/evidently/suite/base_suite.py", line 184, in save_html
    dashboard_id, dashboard_info, graphs = self._build_dashboard_info()
  File "/Users/tugcegunes/Library/Caches/pypoetry/virtualenvs/pov004-automatic-news-filter-uqh4bnaG-py3.9/lib/python3.9/site-packages/evidently/report/report.py", line 171, in _build_dashboard_info
    html_info = renderer.render_html(test)
  File "/Users/tugcegunes/Library/Caches/pypoetry/virtualenvs/pov004-automatic-news-filter-uqh4bnaG-py3.9/lib/python3.9/site-packages/evidently/metrics/data_drift/text_descriptors_drift_metric.py", line 210, in render_html
    results = obj.get_result()
  File "/Users/tugcegunes/Library/Caches/pypoetry/virtualenvs/pov004-automatic-news-filter-uqh4bnaG-py3.9/lib/python3.9/site-packages/evidently/base_metric.py", line 184, in get_result
    raise result.exception
  File "/Users/tugcegunes/Library/Caches/pypoetry/virtualenvs/pov004-automatic-news-filter-uqh4bnaG-py3.9/lib/python3.9/site-packages/evidently/suite/base_suite.py", line 393, in run_calculate
    calculations[calculation] = calculation.calculate(data)
  File "/Users/tugcegunes/Library/Caches/pypoetry/virtualenvs/pov004-automatic-news-filter-uqh4bnaG-py3.9/lib/python3.9/site-packages/evidently/metrics/data_drift/text_descriptors_drift_metric.py", line 105, in calculate
    curr_text_df = pd.concat(
  File "/Users/tugcegunes/Library/Caches/pypoetry/virtualenvs/pov004-automatic-news-filter-uqh4bnaG-py3.9/lib/python3.9/site-packages/pandas/util/_decorators.py", line 331, in wrapper
    return func(*args, **kwargs)
  File "/Users/tugcegunes/Library/Caches/pypoetry/virtualenvs/pov004-automatic-news-filter-uqh4bnaG-py3.9/lib/python3.9/site-packages/pandas/core/reshape/concat.py", line 368, in concat
    op = _Concatenator(
  File "/Users/tugcegunes/Library/Caches/pypoetry/virtualenvs/pov004-automatic-news-filter-uqh4bnaG-py3.9/lib/python3.9/site-packages/pandas/core/reshape/concat.py", line 425, in __init__
    raise ValueError("No objects to concatenate")
ValueError: No objects to concatenate

Jul 04 '23 14:07 tugcegns

Hi @tugcegns,

Thanks for reporting the bug!

Could you share:

Which evidently version are you using?
Are you able to show the report in the notebook? Or the error appears as well?
Does your text data contain missing values in either of the two text feature columns?

Jul 04 '23 15:07 elenasamuylova

Hi,

Thanks for quick response! I'm using evidently 0.3.3 and I've checked the data but no missing values. If I run it in jupyter notebook, I'm getting the same error.

This is error log:

File ~/opt/anaconda3/envs/llm_env/lib/python3.8/site-packages/evidently/suite/base_suite.py:393, in Suite.run_calculate(self, data)
    391 logging.debug(f"Executing {type(calculation)}...")
    392 try:
--> 393     calculations[calculation] = calculation.calculate(data)
    394 except BaseException as ex:
    395     calculations[calculation] = ErrorResult(ex)

File ~/opt/anaconda3/envs/llm_env/lib/python3.8/site-packages/evidently/metrics/data_drift/text_descriptors_drift_metric.py:105, in TextDescriptorsDriftMetric.calculate(self, data)
    103 else:
    104     agg_data = True
--> 105 curr_text_df = pd.concat(
    106     [data.get_current_column(x.feature_name()) for x in list(self.generated_text_features.values())],
    107     axis=1,
    108 )
    109 curr_text_df.columns = list(self.generated_text_features.keys())
    111 ref_text_df = pd.concat(
    112     [data.get_reference_column(x.feature_name()) for x in list(self.generated_text_features.values())],
    113     axis=1,
    114 )

File ~/opt/anaconda3/envs/llm_env/lib/python3.8/site-packages/pandas/core/reshape/concat.py:372, in concat(objs, axis, join, ignore_index, keys, levels, names, verify_integrity, sort, copy)
    369 elif copy and using_copy_on_write():
    370     copy = False
--> 372 op = _Concatenator(
    373     objs,
    374     axis=axis,
    375     ignore_index=ignore_index,
    376     join=join,
    377     keys=keys,
    378     levels=levels,
    379     names=names,
    380     verify_integrity=verify_integrity,
    381     copy=copy,
    382     sort=sort,
    383 )
    385 return op.get_result()

File ~/opt/anaconda3/envs/llm_env/lib/python3.8/site-packages/pandas/core/reshape/concat.py:429, in _Concatenator.__init__(self, objs, axis, join, keys, levels, names, ignore_index, verify_integrity, copy, sort)
    426     objs = list(objs)
    428 if len(objs) == 0:
--> 429     raise ValueError("No objects to concatenate")
    431 if keys is None:
    432     objs = list(com.not_none(*objs))

ValueError: No objects to concatenate

Jul 05 '23 12:07 tugcegns

Thanks @tugcegns - the error trace is helpful! Appreciate you reporting the bug.

Let us dig a bit deeper to test what might be wrong. We cannot immediately reproduce it on our test datasets, but we'll run a few more tests to try to uncover.

Please update if you figure it the condition (e.g. specific parameters of the dataset) when it occurs.

Jul 05 '23 15:07 elenasamuylova

Hi @tugcegns,

We ran on all our test datasets and unfortunately cannot reproduce the error.

Is there a chance you can share a small example dataset for which the error occurs? It can be a very small and obfuscated sample, or a totally toy example as long as you get the same error.
Also, are you able to generate a ColumnSummary metric for the text columns in your dataset?

summary_overview = Report(metrics=[
   ColumnSummaryMetric(column_name="summary"),
])

summary_overview.run(reference_data=ref_df, current_data=cur_df, column_mapping=column_mapping)
summary_overview

If yes, could you share a screenshot to understand the shape and parameters of the data?

Jul 06 '23 17:07 elenasamuylova

Hello @elenasamuylova ,

I prepared the example dataset with text features, here you can find it: https://github.com/tugcegns/evidently-monitoring/blob/main/example_dataset.csv

For the ColumnSummaryMetric, it throws an error which is ValueError: unknown feature type text

Could you let me know if anything is wrong with my dataset values? Thanks!

Jul 10 '23 11:07 tugcegns

Hi @tugcegns,

Thanks for sharing! I reproduced your code on your example dataset in Colab. Here it is: https://colab.research.google.com/drive/1H4SMRHzWC_GFfqGAnhMTQGUtpeQ6Rlmc?usp=sharing

It appears to work correctly.

To double-check, did you import NLTK (required for some descriptors) in your code?

import nltk
nltk.download('words')
nltk.download('wordnet')
nltk.download('omw-1.4')
nltk.download('vader_lexicon')

If yes, and the error still persists, you share any other details about your environment? Are you running the code specifically in Jupyter notebook?

Jul 10 '23 15:07 elenasamuylova

Hi @elenasamuylova,

Importing NLTK actually solved the problem! Thanks for sharing this colab file, it helped a lot! 👍

Jul 11 '23 13:07 tugcegns

Glad it is solved @tugcegns!

Jul 11 '23 17:07 elenasamuylova

evidently evidently copied to clipboard

problem using TextDescriptorsDriftMetric

evidently
evidently copied to clipboard