unitxt icon indicating copy to clipboard operation
unitxt copied to clipboard

🦄 Unitxt: a python library for getting data fired up and set for training and evaluation

Results 201 unitxt issues
Sort by recently updated
recently updated
newest added

Today we have tasks such as `tasks.classification.mutli_class` that are very general by using things like: `class_type` then we can have sentiment and emotion classification under the same task by using...

```python class DynamicFormat(Format): few_shot_format: Format few_shot_with_instruction_format: Format zero_shot_format: Format zero_shot_with_instruction_format: Format def process(instance): has_instruction = "instruction" not in instance or len(instance["instruction"] == 0 has_demos = len(instance["demos"]) == 0 if has_demos:...

With the introductions of metrics that can send data to remote services - one needs a safe way to avoid accidentally sending propriety/confidential data to external services. In the common...

This PR addresses the scattered nature of string/text operators across different modules—operators and processors—making them challenging to locate, reuse, and maintain a consistent standard while tracking changes. The goal is...

@matanor unitxt/src/unitxt/dataset.py currently has from .from .dataset_utils import get_dataset_artifact I just newly cloned both fm-eval and unitxt, and rebuilt the envs. For me, when I try running the basic run_text2text.py...

This is a proposal for a new behaviour that allows to change card operators from the final command such that: `load_dataset("card=cards.wikitq,table_serializer=serializers.table.markdown")` will be loading the wikitq with different table serializer,...

Today, unitxt uses a default seed (42) for all dataset. It's not actually possible to change the seed today. Changing the seed could effect the dataset significantly given random choices,...