hive
hive copied to clipboard
expose a way to modify reports to allow distributed hive to tag reports with partition ids
in hive-distributed, we run p separate instances of HIVE which write to their own output file. in this setting, it may be helpful to tag each report row with it's corresponding partition_id so that when the files are joined, that information is not lost.
to support this, Reporter could have a tagFn: Optional[Callable[[Report], Report]] = None in it's constructor which could be called within the Reporter.file_report method, if it is not None. it could be used to modify the incoming Report.
in the hive-distributed setting, the Reporter callable could be a lambda like this:
Reporter(tagFn=lambda r: r.report.update({ "partition_id": p_id }))
this is also open-ended for any other reasons we may need to tag Reports.