diffdf
diffdf copied to clipboard
Error when two dataframes are empty
Hi,
Love your package.
When I have two empty dataframes I would like to get back information that they are both empty/have the same variables. At the moment, diffdf throws an error for this case. I know it is a bit silly use-case, but proc compare in sas can do this with no issues. Hoping diffdf could do it as well. Please let me know if I need to provide more detail.
Here is the SAS Output when the base and compare are both empty:
Variables Summary
Number of Variables in Common: 8.
Observation Summary
Number of Observations in Common: 0.
Total Number of Observations Read from WORK.ACTUAL: 0.
Total Number of Observations Read from WORK.QC: 0.
Number of Observations with Some Compared Variables Unequal: 0.
Number of Observations with All Compared Variables Equal: 0.
Hey @bms63 ,
Apologies for the delay in reply have been travelling a lot recently for work !
Thanks for highlighting this though, agreed this should probably be changed to provide a more sensible results. Will try and make some changes to account for this !
I thought I'd replied to this, sorry! Yes I agree that this is a bug, as there's still information when we have empty dataframes
Looking at this a bit more I think there are 2 updates to be made, first being that it shouldn't error, second that it would potentially add value to provide a summary of what is actually in each dataset in the report i.e. dataset A consists of 10 rows, 6 columns etc.
If you agree @kieranjmartin then I will probably just use this ticket to solve part 1 and will make a separate ticket for part 2 as that is a slightly different set of changes.
Happy to approach it that way, a new issue to discuss how much information to give would be useful. There are certain checks you can do when comparing empty datasets if they have columns, for instance
~Re-reading the conversation here it looks like there was some confusion around which issue tracks what. To clarify I will use to track the null dataset issue and will use #100 to track the dataset summary enhancement.~