datacompy
datacompy copied to clipboard
Make SparkCompare.report return string
Rather than make it print to stdout (or a file)?
The Compare.report returns a string by default. I think it's better to make those method behave consistently.
@dan-coates Thoughts on this since you're the driving force behind the Spark version. I'm cool with changing things around here.
Also, I think it actually makes things more convenient to return a string as users may:
- send email about the report summary
- customize logging to make it easier to parse from the extremely large spark application log
I'd be also supportive of this change!
Oh boy... I know at one point I had a well-reasoned argument for this with comparisons of what code would look like in both situations.
Other than mimicking SAS's PROC COMPARE as much as possible, I think my rationale was that you probably want to do one of two things with this output - print it, or save it. In the first case, that's one line (comparison.report()) and in the second it's two lines:
with open('my_report.txt', 'w') as report_file:
comparison.report(file=report_file)
If it's a string, it's not massively different, but it's a bit more verbose as you have to either print or write both. That said, if you really want the output as a string for other purposes (like emailing) then it's more work if you have to deal with file handles to accomplish that.
With that in mind, I would rather have the two APIs be more aligned, and I've grown to appreciate functions that return things vs. functions with side effects, so I'd be good with making it return a string.
One compromise could be to leave the spark report function mainly as is but in addition also add the string to be returned at the end.
I know this is a bit long over due but I'll add this to our backlog of things to do. I think a refactor fo the Spark API is in order.
This one has been stale for a while. I think pandas outputs a string and so does Spark. For now I think we can just leave it at that. this can be take and shipped else where. Want to keep the project and simple and streamlined as possible.