datacompy icon indicating copy to clipboard operation
datacompy copied to clipboard

[Feature Request] Add real-time console report output option to compare.matches() API

Open WiktorHawrylik opened this issue 3 months ago • 2 comments

What you think about the following feature? I can assist with the design and lead implementation, but I would need some guidance, especially with snow/fudge.

Description

In datacompy's Compare interface, the .matches() method invokes an action to compare two dataframes. Currently, users call compare.matches() to obtain the Boolean match status and then separately invoke compare.report() to access the generated, human-readable summary. It may be beneficial to allow the report information to be streamed. For example, we could introduce a compare.matches(verbose=True) parameter that prints relevant parts of the report directly to the console as soon as the information becomes available.

My review currently covers mainly SparkSQLCompare. This addition seems feasible, since spark execution graph is already broken down.

Motivation

  1. This feature would improve user experience, especially in interactive or debugging scenarios where instant feedback in the terminal is valuable. For example, an engineer could begin investigating issues as soon as they are reported. In spark when comparing large dataframes, the job might take hours and can consume significant DBU (cost). Early report messages often provide actionable information, for example, "Any duplicates on match values: Yes" motivates a duplicates investigation, or "Number of rows in df1 but not in df2" highlights missing data issues.

  2. Cost savings: If a critical data issue is detected early, the job can be terminated immediately.

  3. This approach aligns with expectations from similar Python tools that allow direct output toggling via a parameter.

It also enhances code interactivity and readability for quick checks and educational contexts (e.g., in Jupyter Notebooks and scripts).

WiktorHawrylik avatar Oct 02 '25 12:10 WiktorHawrylik

@WiktorHawrylik I'm open to the idea. Just so I can understand a bit more Would you happen to have a sense of what the implementation would look like here?

fdosani avatar Oct 03 '25 14:10 fdosani

@fdosani cool - let me draft some PR

WiktorHawrylik avatar Oct 09 '25 13:10 WiktorHawrylik