Add logprobs functionality
This PR has 4 changes:
- Added log inference to WML inference engine
- Add ability to load use space_id and not project_id credentials to WML.
- Add ability to calculate confidence interval for multiple scores
- A post processor to convert log probs dictionary to probabilities of specific class
@elronbandel - I think your concerns are only around (4).
@pawelknes - can you please review (1) + (2)? @elronbandel - do you review (3)?
Regarding (4) - I think @elronbandel thinks it is better addressed in an inference engine, but right now it's very localized changes.
@arielge - can you say how you are going to use the the new post processor?
Regarding (4) The desired design here is described in this issue: https://github.com/IBM/unitxt/issues/1128
The benefit of implementing it this way is that the algorithm @arielge is suggesting will be compatible with many tasks such as multiple choice tasks and classification tasks. And will work out of the box with all the related templates and tasks without requiring any modifications.
Thanks @yoavkatz @elronbandel. The intention was to use the post processor as part of a flow that includes an inference engine. So for example you can have a judge metric that is initialized with a specific template and task, processes the user inputs with the given template and sends to the inference engine. Then the inference engine outputs are fed to the post-processor to get the desired score. @elronbandel I understand your desire to integrate this into the inference engine. This is also a possibility, although I assume different users may want slightly different things (infer for every class is one option, infer once and look at the top probabilities is another; and maybe binary and multi-class will not be treated the same). Currently what we are aiming for is flexibility to mix and match inference engines, templates and post-processors, without necessarily imposing our approach on the inference engine logic or requiring extra fields (as we are dealing specifically with the yes/no answer use case and not a general multi-class classification solution).
superseded by #1243 and #1205