hive icon indicating copy to clipboard operation
hive copied to clipboard

expose a way to modify reports to allow distributed hive to tag reports with partition ids

Open robfitzgerald opened this issue 4 years ago • 0 comments

in hive-distributed, we run p separate instances of HIVE which write to their own output file. in this setting, it may be helpful to tag each report row with it's corresponding partition_id so that when the files are joined, that information is not lost.

to support this, Reporter could have a tagFn: Optional[Callable[[Report], Report]] = None in it's constructor which could be called within the Reporter.file_report method, if it is not None. it could be used to modify the incoming Report.

in the hive-distributed setting, the Reporter callable could be a lambda like this:

Reporter(tagFn=lambda r: r.report.update({ "partition_id": p_id }))

this is also open-ended for any other reasons we may need to tag Reports.

robfitzgerald avatar Mar 31 '21 16:03 robfitzgerald