unitxt icon indicating copy to clipboard operation
unitxt copied to clipboard

🦄 Unitxt: a python library for getting data fired up and set for training and evaluation

Results 201 unitxt issues
Sort by recently updated
recently updated
newest added

Rouge in the old unitxt had some preprocessing that is not included here. (something to do with separation of sentences) this might affect the results. @gitMichal

Capp the maximum number of examples returned by the split random mix (e.g., who cares for a 5% of the examples of a 1trilion sentences for test)

enhancement

I changed a card (added a preprocessing step), but the dataset was loaded from cache: 07/16/2023 13:49:32 - WARNING - datasets.arrow_dataset - Loading cached processed dataset at /Users/yoavkatz/cache/huggingface/datasets/unitxt___data/card=cards.sst2_sentiment,template_item=0/1.1.1/161c975966d35694e0db488ca61993c4a4cfb44975f0fa25e6aac6dc3806b97f/cache-d2a30425e116067b.arrow Need to...

Uninformative name, was it meant to be multipleChoice?

ease-of-use

Adding support for relation-extraction task.

Some data is available in Huggingface spaces and not HF datasets. We'd like a custom loader from Huggingface spaces. ``` class LoadFromHFSpace(Loader): user_name: str space_name:str data_files: Mapping[str, str]] _requirements_list: List[str]...

Implementation of select safety benchmarks used in the MLCommons AI Safety Benchmark (https://mlcommons.org/working-groups/ai-safety/ai-safety/). Based on code at https://github.com/mlcommons/modelgauge. Signed-off-by: Jonathan Bnayahu

These tasks are currently using undocumented templates and tasks, which make it harder for people to use. Also it is not browsable by the exploration UI. https://github.com/IBM/unitxt/blob/main/prepare/cards/bold.py https://github.com/IBM/unitxt/blob/main/prepare/cards/atta_q.py