unitxt
unitxt copied to clipboard
🦄 Unitxt: a python library for getting data fired up and set for training and evaluation
The tests of ExtractFieldValues work well but when I use it, I saw the the field was not added. The reason , I think is that process method modified the...
We have a request for another augmentor that adds whitespace at both the start and end of string: "let's add up to 5 consecutive whitespaces at the beginning and end...
Computing of confidence intervals for `GlobelMetric` objects ([here](https://github.com/IBM/unitxt/blob/main/src/unitxt/metrics.py#L115C6-L115C6)) requires the recalculation of the metric multiple times, depending on the [n_resamples](https://github.com/IBM/unitxt/blob/main/src/unitxt/metrics.py#L62) parameter. This recalculation may be costly in runtime for some...
Some metrics, such as Rouge, accept a tokenizer parameter for better support for foreign languages. It will be helpful to expose this option. https://discuss.huggingface.co/t/which-tokenizer-does-rouge-metric-uses-under-the-hood/19903 https://github.com/google-research/google-research/blob/e3d00617cb28064b6e96ab4e2485079f0ca5a763/rouge/rouge_scorer.py#L60 cc: @perlitz @yoavkatz @gitMichal
- Not in canonic format (field_to_field etc.) - Can only work inplace
Move all fields operators to inherit from field operator (e.g. CastFields is complex and not generic) try to create something simplifying (with 2\multiple interactig fields or with other generic use...
Today templates can be defined that use field that don't exist or not use any fields at all. 1. test_card() should check all templates and not just one. 2. it...
what is an instance? example or row? instances can mean many things
Make a small function that takes a card and adds the license from HF to it (if it was not manually added before). Then we can run it on all...