whylogs icon indicating copy to clipboard operation
whylogs copied to clipboard

What are the different metrics mentioned

Open pradeepdev-1995 opened this issue 1 year ago • 2 comments

I could see multiple metrics,when i call the code

import whylogs as why
from langkit.whylogs.samples import load_chats
chats = load_chats()
results = why.log(chats)

Screenshot from 2024-02-28 18-27-30 What does these metircs actually meant? Where can i find the documentation?

pradeepdev-1995 avatar Feb 28 '24 13:02 pradeepdev-1995

Each row in the picture represents a Langkit metric, whose documentation can be found here: https://github.com/whylabs/langkit/blob/main/langkit/docs/modules.md

Each column in the picture represents a whylogs metric. The following example discusses these metrics: https://github.com/whylabs/whylogs/blob/mainline/python/examples/basic/Inspecting_Profiles.ipynb

Let me know if that answers your question!

FelipeAdachi avatar Feb 28 '24 19:02 FelipeAdachi

So how to analyze this report? Screenshot from 2024-02-29 10-30-11

consider the second-row row : prompt.aggregate_reading_level(aggregate reading level of the input text as calculated by the textstat library) column : cardinality/est( the estimated unique values for each feature) the value is 17.00000001 what does this mean? @FelipeAdachi

pradeepdev-1995 avatar Feb 29 '24 05:02 pradeepdev-1995

It means that, within the data you profiled (with the size of 50, as shown by counts/n), there are approximately 17 different unique values for prompt.aggragate_reading_level. This is not the exact value, it's an estimation, denoted by est after cardinality.

The cardinality value can be more or less useful depending on the feature you're inspecting. In this case, it might not be very interesting. For aggregate_reading_level, the distribution metrics might give more useful information, such as mean, median, or the quantile values.

FelipeAdachi avatar Mar 01 '24 13:03 FelipeAdachi

@FelipeAdachi Okay. So for getting the exact values for aggragate_reading_level,charector_count,automated_readability_index,sentence_count...etc, how to write the code? I am looking for the actual values, rather than the distributions like est,mean,median,..etc

pradeepdev-1995 avatar Mar 01 '24 15:03 pradeepdev-1995

Then you can use langkit directly, like this:

from langkit import llm_metrics, extract
import pandas as pd

df = pd.DataFrame({"prompt":["Hi! how are you?"],"response":["I'm ok, thanks for asking!"]})

enhanced_df = extract(df)

enhanced_df.head()

This will calculate metrics in the llm_metrics for each row of the dataframe.

I'm closing this issue as it is now langkit-related.

FelipeAdachi avatar Mar 01 '24 16:03 FelipeAdachi

@FelipeAdachi Thank you so much

pradeepdev-1995 avatar Mar 04 '24 04:03 pradeepdev-1995