gleam icon indicating copy to clipboard operation
gleam copied to clipboard

Added arguments to mapper.

Open jorgecarleitao opened this issue 5 years ago • 7 comments

This is a backward-incompatible change that modifies the signature of all mappers from (row) to (row, ...args) and Map(name, f) to Map(name, f, ...args), where args are passed by the driver to the executors.

This allows for more complex map operations where the maps depend on intermediary operations / arguments known to the driver.

I used json for serializing/deserializing the arguments, as per comments in #185. The tests are currently failing because marshaling/unmarshaling changes the argument's types (int64 -> float64). In general, I believe that we want to keep the types as part of the serialization / de-serialization, which hints that the json may not be sufficient for our needs. However. I do not have a strong opinion on this: @chrislusf , what do you think?

I decided to use "github.com/stretchr/testify/assert" for the testing, but I have no strong opinions about it. My experience is that it is easy to tell when something is not equal.

jorgecarleitao avatar Jan 30 '20 05:01 jorgecarleitao

Thanks for the PR!

How about adding a new type of Mapper, e.g., MapperWithArgs, to avoid incompatible changes?

chrislusf avatar Jan 30 '20 05:01 chrislusf

And what would be the signature of Map? Would we add a MapWithArgs also?

jorgecarleitao avatar Jan 30 '20 05:01 jorgecarleitao

And what would be the signature of Map? Would we add a MapWithArgs also?

Yes, that seems cleaner.

chrislusf avatar Jan 30 '20 06:01 chrislusf

Looking at the code, this seems a significant change: dependencies on Mapper's signature:

  • Map
  • RegisterMapper
  • GetMapper
  • mappers map[MapperId]MapperObject
  • type MapperObject struct
  • gleamTaskOption
  • processMapper
  • doProcessMapper

Do we create an equivalent function for each of these with *WithArgs, or do we make their arguments interface{}?

jorgecarleitao avatar Jan 30 '20 06:01 jorgecarleitao

yes. a big internal change.

Actually I am also thinking to provide mapper and reducer objects that can have internal states. This is something related, and can be re-engineered together.

chrislusf avatar Jan 30 '20 18:01 chrislusf

I am not sure about the side-effects of adding state to map reducers: doesn't map-reduce architecture relies on mappers being pure functions for many of the optimizations? e.g., if a mapper has a state, that state has to be shared among the executors, which requires a lock to the state on the master and all the consequent communication between executors.

I would consider your idea to deserve an issue on its own.

Coming back to the mapper arguments: I propose to add arguments to the mappers, which is the natural way to support dynamic mappers in a compiled language. Could you propose a design for this? The design I proposed is backward compatible. Are you fine with adding all new methods for "argumented" mappers?

jorgecarleitao avatar Feb 01 '20 07:02 jorgecarleitao

Yes, your proposal looks fine, even though it adds another set of functions.

chrislusf avatar Feb 01 '20 08:02 chrislusf