autolabel icon indicating copy to clipboard operation
autolabel copied to clipboard

Label, clean and enrich text datasets with LLMs.

Results 124 autolabel issues
Sort by recently updated
recently updated
newest added

https://github.com/refuel-ai/autolabel/blob/81b5ff6a88a3d9d66a99a8fc493f41a9871d3547/src/autolabel/labeler.py#L59 https://github.com/refuel-ai/autolabel/blob/81b5ff6a88a3d9d66a99a8fc493f41a9871d3547/src/autolabel/labeler.py#L84

enhancement
data loading

Currently, in case of any exception from the LLM for generation, we return an empty generation, which will later be caught as a Invalid Response Parsing Error. This is misleading,...

llm

Parse JSON outputs from explanations. Chain of thought requires JSON in order to run on datasets like squad and banking. We need to parse outputs from the LLM and send...

llm
labeling_task

Parsing the following response fails when using the regex specified ``` ontext: But bounding the computation time above by some concrete function f(n) often yields complexity classes that depend on...

labeling_task

This will require us to remove usage of autocommit=True when initializing engine. So we'll need to explicitly commit a change/transaction.

caching & state

While labeling 25k examples, we see that the cache size grows to 500MB and the retrieval from cache takes 0.03s per example, which translated to around 10 minutes for 15k...

caching & state

https://github.com/refuel-ai/autolabel/blob/aa04feb500ad74834b609f11d5f24c866996a0c0/docs/index.md?plain=1#L7 I think this page is removed now?

documentation

Track token usage, cost during labeling runs Modify output of cost computation during plan to clarify this is an upper bound cost

llm

Currently, if `example_template` is not provided, `agent.plan` fails with an unintuitive error: ``` In [6]: agent.plan('docs/assets/movie_reviews.csv') ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0% -:--:-- --------------------------------------------------------------------------- AttributeError Traceback (most recent call last) Input In [6], in...

When a user attempts to run `agent.plan` or `agent.run` on a dataset, we should first validate that any data columns needed for labeling/evaluation are in the correct format. For example,...