Design tagging system for data

Open zyteka opened this issue 3 years ago • 0 comments

A lot of transformers may need to save some information about the data they transformed in order to transform the corresponding explanations. In some case, this is a single value per transformer (such as saving features that were encoded). In this case, these values can just be stored as instance variables in the transformer object itself; pyreal already supports this.

Sometimes, however, the data stored is specific to the individual row of data being transformed. For example, a transformer that pads variable-length inputs to a common length needs to keep track of the original length of data for each individual row. In this case, information needs to be stored that is connected to the data that transformers can then access.

There are several ways we could go about doing this. For this issue, we should come up with a comprehensive plan choosing and justifying a method.

Some options to consider:

Creating a Data class, which includes a dictionary of optional tags that transformers can add and access. This is very flexible, and allows for future extensions to add additional Data functionality, but would require all of pyreal to be re-written to use this Data class.
Build tags into the existing explanation system; Explanation objects will now hold tags passed to them from Explainers, which in turn get the tags when they run their transformers' data_transform methods.

Jul 21 '22 13:07 zyteka