differences icon indicating copy to clipboard operation
differences copied to clipboard

Package cannot handle NAs in design matrix when utilizing 'split_sample_by'

Open johnkohler00 opened this issue 1 year ago • 2 comments

There is a bug in the following line: https://github.com/bernardodionisi/differences/blob/86b15a2cd3a8235a3287f0cb5dc963a04b504f6f/src/differences/attgt/attgt.py#L339 that causes a Key Error when attempting to utilize the 'split_sample_by' feature on a dataset that contains N/A values in columns of the design matrix. The code is only passing the data[split_sample_by] column rather than full data object, which is the behavior when there are no N/A values. This causes an error later on when parse_split_sample() attemps to index the column again here: https://github.com/bernardodionisi/differences/blob/86b15a2cd3a8235a3287f0cb5dc963a04b504f6f/src/differences/attgt/difference.py#L450 The original line should read else self.data.loc[

johnkohler00 avatar Jan 17 '24 18:01 johnkohler00

I have submitted a pull request. Please let me know if you agree with this diagnosis of the issue.

johnkohler00 avatar Jan 17 '24 18:01 johnkohler00

Hi, I apologize I missed this issue, I have not checked in for a while. I will try to take a look at the PR soonish

bernardodionisi avatar May 30 '24 14:05 bernardodionisi