diffdf icon indicating copy to clipboard operation
diffdf copied to clipboard

Error when two dataframes are empty

Open bms63 opened this issue 5 years ago • 4 comments

Hi,

Love your package.

When I have two empty dataframes I would like to get back information that they are both empty/have the same variables. At the moment, diffdf throws an error for this case. I know it is a bit silly use-case, but proc compare in sas can do this with no issues. Hoping diffdf could do it as well. Please let me know if I need to provide more detail.

Here is the SAS Output when the base and compare are both empty:

Variables Summary

                                           Number of Variables in Common: 8.


                                                    Observation Summary


                              Number of Observations in Common: 0.
                              Total Number of Observations Read from WORK.ACTUAL: 0.
                              Total Number of Observations Read from WORK.QC: 0.

                              Number of Observations with Some Compared Variables Unequal: 0.
                              Number of Observations with All Compared Variables Equal: 0.

bms63 avatar Nov 27 '19 18:11 bms63

Hey @bms63 ,

Apologies for the delay in reply have been travelling a lot recently for work !

Thanks for highlighting this though, agreed this should probably be changed to provide a more sensible results. Will try and make some changes to account for this !

gowerc avatar Dec 10 '19 18:12 gowerc

I thought I'd replied to this, sorry! Yes I agree that this is a bug, as there's still information when we have empty dataframes

kieranjmartin avatar Dec 11 '19 14:12 kieranjmartin

Looking at this a bit more I think there are 2 updates to be made, first being that it shouldn't error, second that it would potentially add value to provide a summary of what is actually in each dataset in the report i.e. dataset A consists of 10 rows, 6 columns etc.

If you agree @kieranjmartin then I will probably just use this ticket to solve part 1 and will make a separate ticket for part 2 as that is a slightly different set of changes.

gowerc avatar Feb 01 '20 21:02 gowerc

Happy to approach it that way, a new issue to discuss how much information to give would be useful. There are certain checks you can do when comparing empty datasets if they have columns, for instance

kieranjmartin avatar Feb 04 '20 13:02 kieranjmartin

~Re-reading the conversation here it looks like there was some confusion around which issue tracks what. To clarify I will use to track the null dataset issue and will use #100 to track the dataset summary enhancement.~

gowerc avatar Jul 08 '24 15:07 gowerc