AutoPrompt
AutoPrompt copied to clipboard
At least one label specified must be in y_true
Hi, with the latest changes I got a new error when run run_generation_pipeline.py
Traceback (most recent call last):
File "path_to_repo/AutoPrompt/run_generation_pipeline_alena.py", line 64, in <module>
best_prompt = ranker_pipeline.run_pipeline(opt.num_ranker_steps)
File "path_to_repo/AutoPrompt/optimization_pipeline.py", line 272, in run_pipeline
stop_criteria = self.step()
File "path_to_repo/AutoPrompt/optimization_pipeline.py", line 252, in step
self.eval.add_history(self.cur_prompt, self.task_description)
File "path_to_repo/AutoPrompt/eval/evaluator.py", line 115, in add_history
conf_matrix = confusion_matrix(self.dataset['annotation'],
File "path_to_env/AutoPrompt/lib/python3.10/site-packages/sklearn/utils/_param_validation.py", line 214, in wrapper
return func(*args, **kwargs)
File "path_to_env/AutoPrompt/lib/python3.10/site-packages/sklearn/metrics/_classification.py", line 340, in confusion_matrix
raise ValueError("At least one label specified must be in y_true")
ValueError: At least one label specified must be in y_true
config_ranking.yml
and config_generation.yml
are not modified.
config_default.yml
is
use_wandb: False
dataset:
name: 'dataset'
records_path: null
initial_dataset: ''
label_schema: ["Yes", "No"]
max_samples: 5
semantic_sampling: False # Change to True in case you don't have M1. Currently there is an issue with faiss and M1
# annotator:
# method : 'argilla'
# config:
# api_url: ''
# api_key: 'admin.apikey'
# workspace: 'admin'
# time_interval: 5
annotator:
method: 'llm'
config:
llm:
type: 'OpenAI'
name: 'gpt-3.5-turbo-0613'
instruction: 'Assess whether the text contains a harmful topic.
Answer Yes if it does and No otherwise.'
num_workers: 2
prompt: 'prompts/predictor_completion/prediction.prompt'
mini_batch_size: 1
mode: 'annotation'
predictor:
method : 'llm'
config:
llm:
type: 'OpenAI'
name: 'gpt-3.5-turbo-0613'
# async_params:
# retry_interval: 10
# max_retries: 2
model_kwargs: {"seed": 220}
num_workers: 2
prompt: 'prompts/predictor_completion/prediction.prompt'
mini_batch_size: 1 #change to >1 if you want to include multiple samples in the one prompt
mode: 'prediction'
meta_prompts:
folder: 'prompts/meta_prompts_classification'
num_err_prompt: 1 # Number of error examples per sample in the prompt generation
num_err_samples: 2 # Number of error examples per sample in the sample generation
history_length: 4 # Number of sample in the meta-prompt history
num_generated_samples: 10 # Number of generated samples at each iteration
num_initialize_samples: 10 # Number of generated samples at iteration 0, in zero-shot case
samples_generation_batch: 10 # Number of samples generated in one call to the LLM
num_workers: 5 #Number of parallel workers
warmup: 4 # Number of warmup steps
eval:
function_name: 'accuracy'
num_large_errors: 4
num_boundary_predictions : 0
error_threshold: 0.5
llm:
type: 'OpenAI'
name: 'gpt-3.5-turbo-0613'
temperature: 0.8
stop_criteria:
max_usage: 2 #In $ in case of OpenAI models, otherwise number of tokens
patience: 3 # Number of patience steps
min_delta: 0.05 # Delta for the improvement definition
I run command
python run_generation_pipeline.py \
--prompt "Write a good and comprehensive movie review about a specific movie." \
--task_description "Assistant is a large language model that is tasked with writing movie reviews."
Hi, Observe that you modify the annotator to be an LLM estimator. However, the prompt for the annotator asks the model to classify 'Yes' or 'No', where the ranker labels are '1','2',..,'5' (see the config_ranking label_schema). In this case, the annotator provides non-existing labels which results in this error.
An example of a valid instruction for your task: "Analyze the following movie review, and provide a score between 1 to 5"
One more thing, I see that you using gpt-3.5 for the meta-prompts (and the annotator). This should not work well, especially for the generation tasks, it's important to use GPT-4/4.5 to get optimal performances.
Thanks! It works! This example is worth adding to the documentation.
Hi @Eladlev , I was working on a generation task. I followed the instructions that in config_generation:
annotator:
method : ''
and in config_default:
method: 'llm'
config:
llm:
type: 'OpenAI'
name: 'gpt-4'
instruction:
"Assess this generated message,
1. does it align with the intent of user input,
2. does it rephrase user input,
If all the answers are Yes, then response '1', otherwise response '0'"
num_workers: 5
prompt: 'prompts/predictor_completion/prediction.prompt'
mini_batch_size: 1
mode: 'annotation'
Is it expected that in the dump/generator/dataset.csv, the 'prediction' and 'score' are all blank? And can you suggest me the role of 'annotator' in generation tasks?
Thank you
Hi @danielliu99,
- At least the predictor should not be blank (after the iteration is completed)
- If you are using an LLM ranker, then you should skip the ranking training phase (since you already have an LLM ranker) and you should change this line:
https://github.com/Eladlev/AutoPrompt/blob/7f373f219aa360cd2de38c6aa700c1dff282d7de/run_generation_pipeline.py#L53
To:
generation_config_params.eval.function_params.instruction = ranker_config_params.annotator.config.instruction
- In the generation task, there are two phases in phase 1 we train a ranker prompt (this is the phase that should be skipped in your case), in this case, the role of the annotator is similar to the classification task. In the second part, we are not using an annotator (this is the method is left blank). Instead, we are modifying the score function to be the rank of the ranking model and we apply it directly on the model prediction (so there is a need for the annotator part since it's part of the score function).