stanford_alpaca
stanford_alpaca copied to clipboard
How did you augment the data?
Hello, I am following Alpacas one by one.
I have followed the current regen.jsonl and output the result as below.
[
{
"instruction": "Retrieve the biggest peak in the world.",
"input": "",
"output": "The highest peak in the world is Mount Everest, which has a summit elevation of 8,848 meters (29,032 feet).",
"most_similar_instructions": {
"find the toxic word or phrase in the sentence.": 0.375,
"Identify the bias or stereotype in the given prompt.": 0.375,
"Replace all the human names in the paragraph with <anonymized>.": 0.3529411764705882,
"Replace the placeholders in the given text with appropriate named entities.": 0.33333333333333326,
"Identify the pos tag of the word in the given sentence.": 0.33333333333333326,
"Find the misspelling in the sentence, and give me the correct spelling.": 0.3157894736842105,
"Return the SSN number for the person.": 0.2857142857142857,
"Select the oldest person from the list.": 0.2857142857142857,
"Give me the definition of the word.": 0.2857142857142857,
"Extract all the country names in the paragraph, and list them separated by commas.": 0.2857142857142857
},
"avg_similarity_score": 0.12068036281982937
},
Do you create output data using the generated instruction? If so, how did you create the input data? Did you put it all in by hand? 52k...?