datacompy icon indicating copy to clipboard operation
datacompy copied to clipboard

Make SparkCompare.report return string

Open dclong opened this issue 5 years ago • 6 comments

Rather than make it print to stdout (or a file)?

The Compare.report returns a string by default. I think it's better to make those method behave consistently.

dclong avatar Jul 19 '20 23:07 dclong

@dan-coates Thoughts on this since you're the driving force behind the Spark version. I'm cool with changing things around here.

fdosani avatar Jul 20 '20 16:07 fdosani

Also, I think it actually makes things more convenient to return a string as users may:

  1. send email about the report summary
  2. customize logging to make it easier to parse from the extremely large spark application log

dclong avatar Jul 21 '20 00:07 dclong

I'd be also supportive of this change!

jborchma avatar Jul 21 '20 00:07 jborchma

Oh boy... I know at one point I had a well-reasoned argument for this with comparisons of what code would look like in both situations.

Other than mimicking SAS's PROC COMPARE as much as possible, I think my rationale was that you probably want to do one of two things with this output - print it, or save it. In the first case, that's one line (comparison.report()) and in the second it's two lines:

with open('my_report.txt', 'w') as report_file:
    comparison.report(file=report_file)

If it's a string, it's not massively different, but it's a bit more verbose as you have to either print or write both. That said, if you really want the output as a string for other purposes (like emailing) then it's more work if you have to deal with file handles to accomplish that.

With that in mind, I would rather have the two APIs be more aligned, and I've grown to appreciate functions that return things vs. functions with side effects, so I'd be good with making it return a string.

dan-coates avatar Jul 21 '20 05:07 dan-coates

One compromise could be to leave the spark report function mainly as is but in addition also add the string to be returned at the end.

jborchma avatar Jul 27 '20 21:07 jborchma

I know this is a bit long over due but I'll add this to our backlog of things to do. I think a refactor fo the Spark API is in order.

fdosani avatar Aug 05 '21 23:08 fdosani

This one has been stale for a while. I think pandas outputs a string and so does Spark. For now I think we can just leave it at that. this can be take and shipped else where. Want to keep the project and simple and streamlined as possible.

fdosani avatar Mar 05 '24 00:03 fdosani