whylogs
whylogs copied to clipboard
What are the different metrics mentioned
I could see multiple metrics,when i call the code
import whylogs as why
from langkit.whylogs.samples import load_chats
chats = load_chats()
results = why.log(chats)
What does these metircs actually meant? Where can i find the documentation?
Each row in the picture represents a Langkit metric, whose documentation can be found here: https://github.com/whylabs/langkit/blob/main/langkit/docs/modules.md
Each column in the picture represents a whylogs metric. The following example discusses these metrics: https://github.com/whylabs/whylogs/blob/mainline/python/examples/basic/Inspecting_Profiles.ipynb
Let me know if that answers your question!
So how to analyze this report?
consider the second-row row : prompt.aggregate_reading_level(aggregate reading level of the input text as calculated by the textstat library) column : cardinality/est( the estimated unique values for each feature) the value is 17.00000001 what does this mean? @FelipeAdachi
It means that, within the data you profiled (with the size of 50, as shown by counts/n
), there are approximately 17 different unique values for prompt.aggragate_reading_level
. This is not the exact value, it's an estimation, denoted by est
after cardinality
.
The cardinality value can be more or less useful depending on the feature you're inspecting. In this case, it might not be very interesting. For aggregate_reading_level, the distribution metrics might give more useful information, such as mean
, median
, or the quantile values.
@FelipeAdachi Okay. So for getting the exact values for aggragate_reading_level,charector_count,automated_readability_index,sentence_count...etc, how to write the code? I am looking for the actual values, rather than the distributions like est,mean,median,..etc
Then you can use langkit directly, like this:
from langkit import llm_metrics, extract
import pandas as pd
df = pd.DataFrame({"prompt":["Hi! how are you?"],"response":["I'm ok, thanks for asking!"]})
enhanced_df = extract(df)
enhanced_df.head()
This will calculate metrics in the llm_metrics
for each row of the dataframe.
I'm closing this issue as it is now langkit-related.
@FelipeAdachi Thank you so much