unitxt Support for handling sensitive data sent to remote services

Added modifications to loaders and the RemoteMetric class so that a metric can check if it's allowed to handle data which is sent to a remote service as per #646

May 08 '24 07:05 pawelknes

I think it might simplify things if you use AddFields operator to add the classification:
self.add_data_classification = AddFields({"data_classification": self.data_classification})
multi_stream = MultiStream.from_iterables(dataset)
return self.add_data_classification(multi_stream)
I think this approach is also more plausible from the prespective of code responsibility seperation.

Also do we want to have some sort of hierarchy of data classification types?

Thanks for the suggestion

@yoavkatz what about this hierarchy?

May 13 '24 08:05 pawelknes

Hi @pawelknes . Thanks for the PR. It looks very good.

Since the issue started, we saw the need for this expand beyond metrics. For example, a person added a Translate post processor that sends the predicted data to IBM Cloud translation service. There is also a new InferenceEngine class that is sends data to 3rd party services (used by LLM as judges metric)

So perhaps instead of making metric centric, we should make it more generic (applicable for all Artifacts).

Similar to how class ComputeExpressionMixin(Artifact) adds ability to compute expressions.

What do you think?

May 13 '24 12:05 yoavkatz

Hi @pawelknes . Thanks for the PR. It looks very good.

Since the issue started, we saw the need for this expand beyond metrics. For example, a person added a Translate post processor that sends the predicted data to IBM Cloud translation service. There is also a new InferenceEngine class that is sends data to 3rd party services (used by LLM as judges metric)

So perhaps instead of making metric centric, we should make it more generic (applicable for all Artifacts).

Similar to how class ComputeExpressionMixin(Artifact) adds ability to compute expressions.

What do you think?

Sure, it sounds good and I can work on this. Though we want to add this in the current PR, or firstly merge it and then create a new issue for expanding it to a general case?

May 13 '24 13:05 pawelknes

Hi @pawelknes . Thanks for the PR. It looks very good. Since the issue started, we saw the need for this expand beyond metrics. For example, a person added a Translate post processor that sends the predicted data to IBM Cloud translation service. There is also a new InferenceEngine class that is sends data to 3rd party services (used by LLM as judges metric) So perhaps instead of making metric centric, we should make it more generic (applicable for all Artifacts). Similar to how class ComputeExpressionMixin(Artifact) adds ability to compute expressions. What do you think?

Sure, it sounds good and I can work on this. Though we want to add this in the current PR, or firstly merge it and then create a new issue for expanding it to a general case?

I tend to think that we should do it now, because in anycase, we will need to document it in details and instruct the users. We probably don't want to do it twice. Looking the code, it may not be a huge change. More of renaming and moving the code from metrics to artifact.

May 13 '24 13:05 yoavkatz

Hi @pawelknes . Thanks for the PR. It looks very good. Since the issue started, we saw the need for this expand beyond metrics. For example, a person added a Translate post processor that sends the predicted data to IBM Cloud translation service. There is also a new InferenceEngine class that is sends data to 3rd party services (used by LLM as judges metric) So perhaps instead of making metric centric, we should make it more generic (applicable for all Artifacts). Similar to how class ComputeExpressionMixin(Artifact) adds ability to compute expressions. What do you think?

Sure, it sounds good and I can work on this. Though we want to add this in the current PR, or firstly merge it and then create a new issue for expanding it to a general case?

I tend to think that we should do it now, because in anycase, we will need to document it in details and instruct the users. We probably don't want to do it twice. Looking the code, it may not be a huge change. More of renaming and moving the code from metrics to artifact.

Okay, so I'll start working on it

May 13 '24 13:05 pawelknes

@yoavkatz I've moved functionalities so that they can be used by any artifact. As for the operator responsible for verifications, I made it accepting either an artifact name (and read a respective policy from env) or a Python object (Accuracy etc.) and read a policy from the attribute

May 15 '24 10:05 pawelknes

Pawel - thanks for this PR. It's an important one, especially since we have LLM as judges. I've made some additions and changes to the PR - I'll be glad for your review.

I added default data classifications policies to loaders, so users will not be required to add them always.
Reviewed and added more documentation
Made the use of the environment variable more consistent with other place
Made the default policy 'None' in artifact so we know if the user explicitly set it , or not.

May 22 '24 16:05 yoavkatz

@pawelknes - I think the next steps are:

Remove allow_passing_data_to_remote_api from settings.
In class InferenceEngine(abc.ABC, Artifact): Make sure infer() checks data classification of the inferred instance (best to do it in the base method, so people writing inference engines won't need to handle lit). Make BAM and WML handle 'public' and 'proprietary' data by default.
LlamaIndexCorrectness(InstanceMetric) - make it process only 'public' data by default.
class RemoteMetric(SingleStreamOperator, Metric) - make it handle 'public' and 'proprietary' data by default.

May 23 '24 12:05 yoavkatz

@pawelknes - I think the next steps are:

Remove allow_passing_data_to_remote_api from settings.

In class InferenceEngine(abc.ABC, Artifact): Make sure infer() checks data classification of the inferred instance (best to do it in the base method, so people writing inference engines won't need to handle lit). Make BAM and WML handle 'public' and 'proprietary' data by default.

LlamaIndexCorrectness(InstanceMetric) - make it process only 'public' data by default.

class RemoteMetric(SingleStreamOperator, Metric) - make it handle 'public' and 'proprietary' data by default.

Yup, I'll make these changes

May 23 '24 13:05 pawelknes