unitxt
unitxt copied to clipboard
🦄 Unitxt: a python library for getting data fired up and set for training and evaluation
Today we have tasks such as `tasks.classification.mutli_class` that are very general by using things like: `class_type` then we can have sentiment and emotion classification under the same task by using...
```python class DynamicFormat(Format): few_shot_format: Format few_shot_with_instruction_format: Format zero_shot_format: Format zero_shot_with_instruction_format: Format def process(instance): has_instruction = "instruction" not in instance or len(instance["instruction"] == 0 has_demos = len(instance["demos"]) == 0 if has_demos:...
With the introductions of metrics that can send data to remote services - one needs a safe way to avoid accidentally sending propriety/confidential data to external services. In the common...
This PR addresses the scattered nature of string/text operators across different modules—operators and processors—making them challenging to locate, reuse, and maintain a consistent standard while tracking changes. The goal is...
@matanor unitxt/src/unitxt/dataset.py currently has from .from .dataset_utils import get_dataset_artifact I just newly cloned both fm-eval and unitxt, and rebuilt the envs. For me, when I try running the basic run_text2text.py...
This is a proposal for a new behaviour that allows to change card operators from the final command such that: `load_dataset("card=cards.wikitq,table_serializer=serializers.table.markdown")` will be loading the wikitq with different table serializer,...
Today, unitxt uses a default seed (42) for all dataset. It's not actually possible to change the seed today. Changing the seed could effect the dataset significantly given random choices,...