evidently icon indicating copy to clipboard operation
evidently copied to clipboard

problem using TextDescriptorsDriftMetric

Open tugcegns opened this issue 1 year ago • 8 comments

Hi,

I'm applying to some metrics for my dataset which has text and numerical features. TextDescriptorsDriftMetric is applied to the column content but I get an error saying "ValueError: No objects to concatenate" when save_hmtl is done.

  report = Report(
      metrics=[
          ColumnDriftMetric(column_name="content"),
          ColumnDriftMetric(column_name="predictions"), 
          DatasetDriftMetric(),
          DatasetMissingValuesMetric(),
          TextDescriptorsDriftMetric(column_name="content"),
      ]
  )
  report.run(
      reference_data=ref_df,
      current_data=cur_df,
      column_mapping=column_mapping,
  )

  report.save_html("monitor/aggregated.html")

I checked the reference and current datasets and they're equal and not empty. And column_mapping is defined as below. What could be the reason for this error?

from evidently import ColumnMapping

column_mapping = ColumnMapping()
column_mapping = ColumnMapping(
    prediction="predictions",
    text_features=["content", "summary"],
    target=None,
)

Error output:

Traceback (most recent call last):
  File "<string>", line 15, in <module>
  File "/Users/tugcegunes/Library/Caches/pypoetry/virtualenvs/pov004-automatic-news-filter-uqh4bnaG-py3.9/lib/python3.9/site-packages/evidently/suite/base_suite.py", line 184, in save_html
    dashboard_id, dashboard_info, graphs = self._build_dashboard_info()
  File "/Users/tugcegunes/Library/Caches/pypoetry/virtualenvs/pov004-automatic-news-filter-uqh4bnaG-py3.9/lib/python3.9/site-packages/evidently/report/report.py", line 171, in _build_dashboard_info
    html_info = renderer.render_html(test)
  File "/Users/tugcegunes/Library/Caches/pypoetry/virtualenvs/pov004-automatic-news-filter-uqh4bnaG-py3.9/lib/python3.9/site-packages/evidently/metrics/data_drift/text_descriptors_drift_metric.py", line 210, in render_html
    results = obj.get_result()
  File "/Users/tugcegunes/Library/Caches/pypoetry/virtualenvs/pov004-automatic-news-filter-uqh4bnaG-py3.9/lib/python3.9/site-packages/evidently/base_metric.py", line 184, in get_result
    raise result.exception
  File "/Users/tugcegunes/Library/Caches/pypoetry/virtualenvs/pov004-automatic-news-filter-uqh4bnaG-py3.9/lib/python3.9/site-packages/evidently/suite/base_suite.py", line 393, in run_calculate
    calculations[calculation] = calculation.calculate(data)
  File "/Users/tugcegunes/Library/Caches/pypoetry/virtualenvs/pov004-automatic-news-filter-uqh4bnaG-py3.9/lib/python3.9/site-packages/evidently/metrics/data_drift/text_descriptors_drift_metric.py", line 105, in calculate
    curr_text_df = pd.concat(
  File "/Users/tugcegunes/Library/Caches/pypoetry/virtualenvs/pov004-automatic-news-filter-uqh4bnaG-py3.9/lib/python3.9/site-packages/pandas/util/_decorators.py", line 331, in wrapper
    return func(*args, **kwargs)
  File "/Users/tugcegunes/Library/Caches/pypoetry/virtualenvs/pov004-automatic-news-filter-uqh4bnaG-py3.9/lib/python3.9/site-packages/pandas/core/reshape/concat.py", line 368, in concat
    op = _Concatenator(
  File "/Users/tugcegunes/Library/Caches/pypoetry/virtualenvs/pov004-automatic-news-filter-uqh4bnaG-py3.9/lib/python3.9/site-packages/pandas/core/reshape/concat.py", line 425, in __init__
    raise ValueError("No objects to concatenate")
ValueError: No objects to concatenate

tugcegns avatar Jul 04 '23 14:07 tugcegns

Hi @tugcegns,

Thanks for reporting the bug!

Could you share:

  • Which evidently version are you using?
  • Are you able to show the report in the notebook? Or the error appears as well?
  • Does your text data contain missing values in either of the two text feature columns?

elenasamuylova avatar Jul 04 '23 15:07 elenasamuylova

Hi,

Thanks for quick response! I'm using evidently 0.3.3 and I've checked the data but no missing values. If I run it in jupyter notebook, I'm getting the same error.

This is error log:

File ~/opt/anaconda3/envs/llm_env/lib/python3.8/site-packages/evidently/suite/base_suite.py:393, in Suite.run_calculate(self, data)
    391 logging.debug(f"Executing {type(calculation)}...")
    392 try:
--> 393     calculations[calculation] = calculation.calculate(data)
    394 except BaseException as ex:
    395     calculations[calculation] = ErrorResult(ex)

File ~/opt/anaconda3/envs/llm_env/lib/python3.8/site-packages/evidently/metrics/data_drift/text_descriptors_drift_metric.py:105, in TextDescriptorsDriftMetric.calculate(self, data)
    103 else:
    104     agg_data = True
--> 105 curr_text_df = pd.concat(
    106     [data.get_current_column(x.feature_name()) for x in list(self.generated_text_features.values())],
    107     axis=1,
    108 )
    109 curr_text_df.columns = list(self.generated_text_features.keys())
    111 ref_text_df = pd.concat(
    112     [data.get_reference_column(x.feature_name()) for x in list(self.generated_text_features.values())],
    113     axis=1,
    114 )

File ~/opt/anaconda3/envs/llm_env/lib/python3.8/site-packages/pandas/core/reshape/concat.py:372, in concat(objs, axis, join, ignore_index, keys, levels, names, verify_integrity, sort, copy)
    369 elif copy and using_copy_on_write():
    370     copy = False
--> 372 op = _Concatenator(
    373     objs,
    374     axis=axis,
    375     ignore_index=ignore_index,
    376     join=join,
    377     keys=keys,
    378     levels=levels,
    379     names=names,
    380     verify_integrity=verify_integrity,
    381     copy=copy,
    382     sort=sort,
    383 )
    385 return op.get_result()

File ~/opt/anaconda3/envs/llm_env/lib/python3.8/site-packages/pandas/core/reshape/concat.py:429, in _Concatenator.__init__(self, objs, axis, join, keys, levels, names, ignore_index, verify_integrity, copy, sort)
    426     objs = list(objs)
    428 if len(objs) == 0:
--> 429     raise ValueError("No objects to concatenate")
    431 if keys is None:
    432     objs = list(com.not_none(*objs))

ValueError: No objects to concatenate

tugcegns avatar Jul 05 '23 12:07 tugcegns

Thanks @tugcegns - the error trace is helpful! Appreciate you reporting the bug.

Let us dig a bit deeper to test what might be wrong. We cannot immediately reproduce it on our test datasets, but we'll run a few more tests to try to uncover.

Please update if you figure it the condition (e.g. specific parameters of the dataset) when it occurs.

elenasamuylova avatar Jul 05 '23 15:07 elenasamuylova

Hi @tugcegns,

We ran on all our test datasets and unfortunately cannot reproduce the error.

  1. Is there a chance you can share a small example dataset for which the error occurs? It can be a very small and obfuscated sample, or a totally toy example as long as you get the same error.

  2. Also, are you able to generate a ColumnSummary metric for the text columns in your dataset?

summary_overview = Report(metrics=[
   ColumnSummaryMetric(column_name="summary"),
])

summary_overview.run(reference_data=ref_df, current_data=cur_df, column_mapping=column_mapping)
summary_overview

If yes, could you share a screenshot to understand the shape and parameters of the data?

elenasamuylova avatar Jul 06 '23 17:07 elenasamuylova

Hello @elenasamuylova ,

I prepared the example dataset with text features, here you can find it: https://github.com/tugcegns/evidently-monitoring/blob/main/example_dataset.csv

For the ColumnSummaryMetric, it throws an error which is ValueError: unknown feature type text

Could you let me know if anything is wrong with my dataset values? Thanks!

tugcegns avatar Jul 10 '23 11:07 tugcegns

Hi @tugcegns,

Thanks for sharing! I reproduced your code on your example dataset in Colab. Here it is: https://colab.research.google.com/drive/1H4SMRHzWC_GFfqGAnhMTQGUtpeQ6Rlmc?usp=sharing

It appears to work correctly.

To double-check, did you import NLTK (required for some descriptors) in your code?

import nltk
nltk.download('words')
nltk.download('wordnet')
nltk.download('omw-1.4')
nltk.download('vader_lexicon')

If yes, and the error still persists, you share any other details about your environment? Are you running the code specifically in Jupyter notebook?

elenasamuylova avatar Jul 10 '23 15:07 elenasamuylova

Hi @elenasamuylova,

Importing NLTK actually solved the problem! Thanks for sharing this colab file, it helped a lot! 👍

tugcegns avatar Jul 11 '23 13:07 tugcegns

Glad it is solved @tugcegns!

elenasamuylova avatar Jul 11 '23 17:07 elenasamuylova