gleam
gleam copied to clipboard
Added arguments to mapper.
This is a backward-incompatible change that modifies the signature of all mappers from (row) to (row, ...args) and Map(name, f) to Map(name, f, ...args), where args are passed by the driver to the executors.
This allows for more complex map operations where the maps depend on intermediary operations / arguments known to the driver.
I used json for serializing/deserializing the arguments, as per comments in #185. The tests are currently failing because marshaling/unmarshaling changes the argument's types (int64 -> float64). In general, I believe that we want to keep the types as part of the serialization / de-serialization, which hints that the json may not be sufficient for our needs. However. I do not have a strong opinion on this: @chrislusf , what do you think?
I decided to use "github.com/stretchr/testify/assert" for the testing, but I have no strong opinions about it. My experience is that it is easy to tell when something is not equal.
Thanks for the PR!
How about adding a new type of Mapper, e.g., MapperWithArgs, to avoid incompatible changes?
And what would be the signature of Map? Would we add a MapWithArgs also?
And what would be the signature of
Map? Would we add aMapWithArgsalso?
Yes, that seems cleaner.
Looking at the code, this seems a significant change: dependencies on Mapper's signature:
MapRegisterMapperGetMappermappers map[MapperId]MapperObjecttype MapperObject structgleamTaskOptionprocessMapperdoProcessMapper
Do we create an equivalent function for each of these with *WithArgs, or do we make their arguments interface{}?
yes. a big internal change.
Actually I am also thinking to provide mapper and reducer objects that can have internal states. This is something related, and can be re-engineered together.
I am not sure about the side-effects of adding state to map reducers: doesn't map-reduce architecture relies on mappers being pure functions for many of the optimizations? e.g., if a mapper has a state, that state has to be shared among the executors, which requires a lock to the state on the master and all the consequent communication between executors.
I would consider your idea to deserve an issue on its own.
Coming back to the mapper arguments: I propose to add arguments to the mappers, which is the natural way to support dynamic mappers in a compiled language. Could you propose a design for this? The design I proposed is backward compatible. Are you fine with adding all new methods for "argumented" mappers?
Yes, your proposal looks fine, even though it adds another set of functions.