MrSim
MrSim copied to clipboard
Compliments , it is a good and clear project
I was searching a mini project permiting to elaborate in distributed job. Sincerelly i hate Hadoop /spark /flink because they are not designed using a modularization strategy. So if you want test in local (o you can losing time not for study your logic but for config the system), you cant , if you want embed library in your logic , you cant , if you have a different usate to how apply a algorithm you cant . A will study this project for leaning the concepts inside. Have you some suggestions for emproving the project if i would use it in production ?
Hi! Thanks for your feedback! I haven't worked on this library for a long time, but I am pretty sure there are things to be improved. As to exactly what, I'd have to think about it for some time. Any suggestions on your side? (And sorry for the long delay before answering, I don't visit that page very often.)
For my opinion a good software designer might think that every indipendent logic might be developed in a indipendent library. Spark doesn't make it. Essentially actually there are many contexts where it is possible to embedd a mapreduce framework.For example there are contexts where you developed your server making a specific task but would need a way for share computing(connected with your embedded logic) in a cluster. There are other contexts where map/reduce could be executed also in small machines or devices. A suggestion could be
- to optimize code (also for parallel map-reduce)
- map/reduce working with iterators/streams (you receive 2 different iterators for example with indefinite size and you produce a resulting iterator produced by a reduction function, maybe possibility to configure the window size).
- another possible study could be map/reduce in gpu