scalding
scalding copied to clipboard
DistributedCacheFile does not work with Execution
HadoopMode.newFlowConnector ignores (intentionally) the jobConf because we don't want creepy mutation confusing what the Config is.
The problem is, Execution gives no programmatic way to change the Config inside the flow, so DistributedCacheFile, which assumes it can mutate the config, does not work.
There are at least two ways forward:
-
look for the distributed cache keys in the jobconf and mutate those in the Config before running (ugly).
-
Make a thin Execution-based wrapper for distributed cache that encapsulate this exception to the rule, so you have: addToDistributedCache(file): Execution[Unit]. I prefer this route, but it might be more work.
Any chance this could be fixed in the future?