autolabel issues

Move CSV reading, Dataframe reading etc into a "Dataset loader" class

6

https://github.com/refuel-ai/autolabel/blob/81b5ff6a88a3d9d66a99a8fc493f41a9871d3547/src/autolabel/labeler.py#L59 https://github.com/refuel-ai/autolabel/blob/81b5ff6a88a3d9d66a99a8fc493f41a9871d3547/src/autolabel/labeler.py#L84

nihit

enhancement

data loading

LLM label method should pass through exceptions instead of returning empty generations

Currently, in case of any exception from the LLM for generation, we return an empty generation, which will later be caught as a Invalid Response Parsing Error. This is misleading,...

yadavsahil197

llm

Support parsing using JSON outputs

Parse JSON outputs from explanations. Chain of thought requires JSON in order to run on datasets like squad and banking. We need to parse outputs from the LLM and send...

rajasbansal

llm

labeling_task

JSON parsing when the explanation has curly braces

Parsing the following response fails when using the regex specified ``` ontext: But bounding the computation time above by some concrete function f(n) often yields complexity classes that depend on...

rajasbansal

labeling_task

Update sqlalchemy to latest version

1

This will require us to remove usage of autocommit=True when initializing engine. So we'll need to explicitly commit a change/transaction.

yadavsahil197

caching & state

Slow retrieval from cache

While labeling 25k examples, we see that the cache size grows to 500MB and the retrieval from cache takes 0.03s per example, which translated to around 10 minutes for 15k...

rajasbansal

caching & state

Broken tasks link in docs

1

https://github.com/refuel-ai/autolabel/blob/aa04feb500ad74834b609f11d5f24c866996a0c0/docs/index.md?plain=1#L7 I think this page is removed now?

nihit

documentation

Actual vs upper bound cost

2

Track token usage, cost during labeling runs Modify output of cost computation during plan to clarify this is an upper bound cost

nihit

llm

Provide a default `example_template` for different tasks when not specified by user

1

Currently, if `example_template` is not provided, `agent.plan` fails with an unintuitive error: ``` In [6]: agent.plan('docs/assets/movie_reviews.csv') ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0% -:--:-- --------------------------------------------------------------------------- AttributeError Traceback (most recent call last) Input In [6], in...

rishabh-bhargava

Validation of label column needed for dataset being labeled

When a user attempts to run `agent.plan` or `agent.run` on a dataset, we should first validate that any data columns needed for labeling/evaluation are in the correct format. For example,...

rishabh-bhargava

autolabel
autolabel copied to clipboard

Metadata

Move CSV reading, Dataframe reading etc into a "Dataset loader" class

LLM label method should pass through exceptions instead of returning empty generations

Support parsing using JSON outputs

JSON parsing when the explanation has curly braces

Update sqlalchemy to latest version

Slow retrieval from cache

Broken tasks link in docs

Actual vs upper bound cost

Provide a default `example_template` for different tasks when not specified by user

Validation of label column needed for dataset being labeled

← Metadata

Owner

Metadata

autolabel autolabel copied to clipboard

Metadata

← Metadata

Owner

Metadata

autolabel
autolabel copied to clipboard