differences
differences copied to clipboard
Package cannot handle NAs in design matrix when utilizing 'split_sample_by'
There is a bug in the following line: https://github.com/bernardodionisi/differences/blob/86b15a2cd3a8235a3287f0cb5dc963a04b504f6f/src/differences/attgt/attgt.py#L339
that causes a Key Error when attempting to utilize the 'split_sample_by' feature on a dataset that contains N/A values in columns of the design matrix. The code is only passing the data[split_sample_by] column rather than full data object, which is the behavior when there are no N/A values. This causes an error later on when parse_split_sample() attemps to index the column again here:
https://github.com/bernardodionisi/differences/blob/86b15a2cd3a8235a3287f0cb5dc963a04b504f6f/src/differences/attgt/difference.py#L450
The original line should read
else self.data.loc[
I have submitted a pull request. Please let me know if you agree with this diagnosis of the issue.
Hi, I apologize I missed this issue, I have not checked in for a while. I will try to take a look at the PR soonish