DiCE
DiCE copied to clipboard
Enanchment in CF Generation
Hi, I created this pull request to give some hints on CF generation @gaugup.
Regarding Random, although there is very clear and fast, finding combinations between feature sampling and substitution is unclear. The Loop inside, instead of gradually replacing more features actually in your code, only replaces one feature as :
selected_features = np.random.choice(self.features_to_vary, (sample_size, 1), replace=True)
1 should be replaced by num_features_to_vary and then .loc instead of .at.
This method is slower but certainly more complete and still faster than Genetic/KDtree (I have deliberately left it commented out for you). If you want to leave a single variation, I suggest changing .at to ._get_value in the replacement for faster access. As far as genetic is concerned, in the case of datasets with many features, a random initialization is very slow and seems never to end. For this reason, I suggest increasing the population of the KDtree initialization (which is also lowering the initialization time a lot). In addition, I recommend switching to a binary search in the case of requests for a large number of CFs.
Thanks @giandos200 for this PR. I executed all the gates. Could you please examine the failures and re-submit to make all the tests and linting pass? It looks like your PR has a lot of changes which are out of scope of the performance improvement. It will be great if you could clean all this and send out a commit focusing just on the perf improvements.
Regards,
It should be ok now, @gaugup . Maybe I have an older version because I have never changed the imports.