unitxt
unitxt copied to clipboard
🦄 Unitxt: a python library for getting data fired up and set for training and evaluation
Rouge in the old unitxt had some preprocessing that is not included here. (something to do with separation of sentences) this might affect the results. @gitMichal
Capp the maximum number of examples returned by the split random mix (e.g., who cares for a 5% of the examples of a 1trilion sentences for test)
I changed a card (added a preprocessing step), but the dataset was loaded from cache: 07/16/2023 13:49:32 - WARNING - datasets.arrow_dataset - Loading cached processed dataset at /Users/yoavkatz/cache/huggingface/datasets/unitxt___data/card=cards.sst2_sentiment,template_item=0/1.1.1/161c975966d35694e0db488ca61993c4a4cfb44975f0fa25e6aac6dc3806b97f/cache-d2a30425e116067b.arrow Need to...
Adding support for relation-extraction task.
Some data is available in Huggingface spaces and not HF datasets. We'd like a custom loader from Huggingface spaces. ``` class LoadFromHFSpace(Loader): user_name: str space_name:str data_files: Mapping[str, str]] _requirements_list: List[str]...
Implementation of select safety benchmarks used in the MLCommons AI Safety Benchmark (https://mlcommons.org/working-groups/ai-safety/ai-safety/). Based on code at https://github.com/mlcommons/modelgauge. Signed-off-by: Jonathan Bnayahu
These tasks are currently using undocumented templates and tasks, which make it harder for people to use. Also it is not browsable by the exploration UI. https://github.com/IBM/unitxt/blob/main/prepare/cards/bold.py https://github.com/IBM/unitxt/blob/main/prepare/cards/atta_q.py