autolabel
autolabel copied to clipboard
Label, clean and enrich text datasets with LLMs.
**Is your feature request related to a problem? Please describe.** LLM might sometimes have trouble following output format instructions **Describe the solution you'd like** logit_bias: https://twitter.com/AAAzzam/status/1669753725093654565
Currently, the labeling config sent to the Labeling Agent can contain typos in the passed in keys which would lead to the default value being used without informing the user....
We can now load datasets from Huggingface as show below ```python dataset = load_dataset("lex_glue", "ledgar") test_dataset = dataset["test"] test_dataset = map_label_to_string(test_dataset, "label") test_dataset = test_dataset.rename_column("text", "example") ledgar_path = Path("../autolabel/examples/ledgar") with...
Maintaining the same progress bar as before but specifying the #bytes downloaded
Added the streamlit web app, where The users can directly test the auto label, with just running the streamlit run app.py Note this is not yet deployed so it runs...
**Is your feature request related to a problem? Please describe.** Extracting row-level information is more difficult than it needs to be with what the current output is and how it...
**Describe the bug** In `LabelingAgent.run`, `csv_file_name` is only used to define and cache the task but now that there is support for more than just csvs, we need a more...
**Is your feature request related to a problem? Please describe.** Sometimes, during a `question_answering` task, the LLM will output something that is not part of the multiple-choice options for the...
**Describe the bug** Currently, an LLM can produce an output that doesn't match one of the labels in `labels`. In this case, we assume that is the final label produced...