scalding icon indicating copy to clipboard operation
scalding copied to clipboard

DistributedCacheFile does not work with Execution

Open johnynek opened this issue 9 years ago • 1 comments

HadoopMode.newFlowConnector ignores (intentionally) the jobConf because we don't want creepy mutation confusing what the Config is.

The problem is, Execution gives no programmatic way to change the Config inside the flow, so DistributedCacheFile, which assumes it can mutate the config, does not work.

There are at least two ways forward:

  1. look for the distributed cache keys in the jobconf and mutate those in the Config before running (ugly).

  2. Make a thin Execution-based wrapper for distributed cache that encapsulate this exception to the rule, so you have: addToDistributedCache(file): Execution[Unit]. I prefer this route, but it might be more work.

johnynek avatar Mar 24 '15 00:03 johnynek