autolabel
autolabel copied to clipboard
Label, clean and enrich text datasets with LLMs.
https://github.com/refuel-ai/autolabel/blob/81b5ff6a88a3d9d66a99a8fc493f41a9871d3547/src/autolabel/labeler.py#L59 https://github.com/refuel-ai/autolabel/blob/81b5ff6a88a3d9d66a99a8fc493f41a9871d3547/src/autolabel/labeler.py#L84
Currently, in case of any exception from the LLM for generation, we return an empty generation, which will later be caught as a Invalid Response Parsing Error. This is misleading,...
Parse JSON outputs from explanations. Chain of thought requires JSON in order to run on datasets like squad and banking. We need to parse outputs from the LLM and send...
Parsing the following response fails when using the regex specified ``` ontext: But bounding the computation time above by some concrete function f(n) often yields complexity classes that depend on...
This will require us to remove usage of autocommit=True when initializing engine. So we'll need to explicitly commit a change/transaction.
While labeling 25k examples, we see that the cache size grows to 500MB and the retrieval from cache takes 0.03s per example, which translated to around 10 minutes for 15k...
https://github.com/refuel-ai/autolabel/blob/aa04feb500ad74834b609f11d5f24c866996a0c0/docs/index.md?plain=1#L7 I think this page is removed now?
Track token usage, cost during labeling runs Modify output of cost computation during plan to clarify this is an upper bound cost
Currently, if `example_template` is not provided, `agent.plan` fails with an unintuitive error: ``` In [6]: agent.plan('docs/assets/movie_reviews.csv') ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0% -:--:-- --------------------------------------------------------------------------- AttributeError Traceback (most recent call last) Input In [6], in...
When a user attempts to run `agent.plan` or `agent.run` on a dataset, we should first validate that any data columns needed for labeling/evaluation are in the correct format. For example,...