verifyml icon indicating copy to clipboard operation
verifyml copied to clipboard

upload speech to word example notebook set

Open swanhl opened this issue 3 years ago • 8 comments

swanhl avatar Jan 07 '22 08:01 swanhl

This pull request is being automatically deployed with Vercel (learn more).
To see the status of your deployment, click below or on the icon next to each commit.

🔍 Inspect: https://vercel.com/cylynx/verifyml/2FokXGiD7caN9Yh14f75XHhx7mrJ
✅ Preview: https://verifyml-git-speech-example-nb-cylynx.vercel.app

vercel[bot] avatar Jan 07 '22 08:01 vercel[bot]

The test results of your model card is automatically generated with VerifyML! 🎉

📜 Test Result Summary

Type of Tests Pass Fail
Explainability Analysis 0 0
Quantitative Analysis 0 0
Fairness Analysis 3 0

🔍 Inspect: Breast Cancer Wisconsin (Diagnostic) Dataset 🚨 A public repository is required to use the Model Card Viewer.

github-actions[bot] avatar Jan 07 '22 08:01 github-actions[bot]

Some questions about this data point from here:

image

  • How was the match count of 5 derived? It looks like there are 7 matches (everything except the 'two' which was classified as 'to').
  • The output split column seems to have an additional empty string, would that affect the match count?
  • Even though 'three' appears twice, it won't be double-counted right?

On a slightly related note, single digits were converted from numbers to words in your notebook. Were there any larger numbers involved? e.g. did any participant read something like 'one hundred' then google's model returns '100' instead?

jasonho-lynx avatar Jan 07 '22 08:01 jasonho-lynx

I did a intersection count between the 2 set of words (e.g len(setA.intersectionsetB)). So in the example 'three' is spoken twice and the model got it right both times, but i will just count as 1 match. The assumption here is that im assuming the model will always correctly transcribe a 'three', which it is in this example but may not be true for all the cases. Also, spotted a mistake, the truth count in the above example should be 6 not 8, match count will still stand as 5. Its supposed to be a unique count, will change it.

Also, there is no order in my counting logic. Say if the truth is 'i went to sleep' and prediction is 'sleep to went i', match count will be 4 out of 4 but such cases are close to impossible to happen. If the model transcribe it as 'i went two o sleep', it will be 3 out of 4.

Empty string only exists in the prediction set and does not add into the match count.

And yeah there are few participants who will say it in 'hundreds' or 'millions' but my digit converter convert word for word. So thats another naunce.... But most are given long chunks of digits to recite.

swanhl avatar Jan 07 '22 09:01 swanhl

The test results of your model card is automatically generated with VerifyML! 🎉

📜 Test Result Summary

Type of Tests Pass Fail
Explainability Analysis 0 0
Quantitative Analysis 0 0
Fairness Analysis 3 0

🔍 Inspect: Breast Cancer Wisconsin (Diagnostic) Dataset 🚨 A public repository is required to use the Model Card Viewer.

github-actions[bot] avatar Jan 07 '22 10:01 github-actions[bot]

Ok, was looking for the set intersection bit, LGTM! The MinMaxMetricThreshold might not be very applicable in this case since the threshold is selected arbitrarily (I think)? But if it's just for the purposes of an example, should be ok

jasonho-lynx avatar Jan 07 '22 10:01 jasonho-lynx

As discussed, let's modify the overview section to mention that we are evaluating Google's text to speech model. Thanks.

timlrx avatar Jan 10 '22 02:01 timlrx

The test results of your model card is automatically generated with VerifyML! 🎉

📜 Test Result Summary

Type of Tests Pass Fail
Explainability Analysis 0 0
Quantitative Analysis 0 0
Fairness Analysis 3 0

🔍 Inspect: Breast Cancer Wisconsin (Diagnostic) Dataset 🚨 A public repository is required to use the Model Card Viewer.

github-actions[bot] avatar Jan 10 '22 06:01 github-actions[bot]