amazon-bedrock-workshop F1 score in 01_fine-tuning-titan-lite.ipynb

F1 score in 01_fine-tuning-titan-lite.ipynb

Open jicowan opened this issue 3 months ago • 2 comments

I had to run the following code block 2x before it would output the scores. The first time I ran it, the output was blank:

from bert_score import score
reference_summary = [reference_summary]
fine_tuned_model_P, fine_tuned_R, fine_tuned_F1 = score(fine_tuned_generated_response, reference_summary, lang="en")
base_model_P, base_model_R, base_model_F1 = score(base_model_generated_response, reference_summary, lang="en")
print("F1 score: base model ", base_model_F1)
print("F1 score: fine-tuned model", fine_tuned_F1)

Final output:

F1 score: base model  tensor([0.8868])
F1 score: fine-tuned model tensor([0.8532])

Apr 25 '24 19:04 jicowan

You might want to consider using the Model Evaluation feature within Bedrock to compare the models rather than using the score function.

Apr 25 '24 21:04 jicowan

Thanks @jicowan - we will get back to this; prioritizing other bugs for now.

May 15 '24 19:05 w601sxs

amazon-bedrock-workshop amazon-bedrock-workshop copied to clipboard

F1 score in 01_fine-tuning-titan-lite.ipynb

amazon-bedrock-workshop
amazon-bedrock-workshop copied to clipboard