Yoav Katz
Yoav Katz
Yes. You are right - as this is correlation [1,0] and [0,1] are indeed anti-correlated (-1). You can see what they did in f1 (and what they plan to do...
I think MultipleChoiceTemplate is an additional possible template. I think it's only relevant for multi class and not mult label.
@matanor - Please advise. I saw these warnings too.
Hi. I added my comments. I think you should create a card that uses the tasks, and loads the raw data from the file, and converts it to the format...
> Since this is an important NLP task i suggest we try to get it merged asap: > > My suggestion is to follow the conventions and naming in the...
You need to add evaluate_ensemble_judge.py excluded_files = [ "use_llm_as_judge_metric.py", "standalone_evaluation_llm_as_judge.py", "evaluate_summarization_dataset_llm_as_judge.py", "evaluate_different_formats.py", "evaluate_different_templates.py", "evaluate_different_demo_selections.py", "evaluate_a_judge_model_capabilities_on_arena_hard.py", "evaluate_a_model_using_arena_hard.py", "evaluate_llm_as_judge.py", "evaluate_using_metrics_ensemble.py", "evaluate_existing_dataset_by_llm_as_judge.py", ] in unitxt/tests/library/test_examples.py. Without it, the regression tries to run your...
Hi @welisheva22 - Can you update this PR with the above suggested changes?
This PR has 4 changes: 1. Added log inference to WML inference engine 2. Add ability to load use space_id and not project_id credentials to WML. 3. Add ability to...
The problem is when operators modify nested dictionary instance: predictions : { "a" : 3, "b" :4} references : [{ "a" : 5, "b" :6}] Then we have an operator...
@dafnapension - If possible, please give this priority, because we want to make a new release this week.