Update Calculator Interface
The current Calculator interface enforces Function<Exhibit, Iterable<Obs>>, update this to Function<Exhibit, Exhibit>. Also, rename Calculator to Functor.
Attached code for an initial pass at this. Here are the highlights broken down per module:
Core
- Migrated some interfaces to abstract base class for code re-use(hashCode, equals)
- I'm using a hard-coded array of primes for some hashCode implementations(e.g.: here), not sure if there's a better way.
Avro/MongoDB/Thrift
- Mainly just updates due the interface changes in
Core
SQL
- Updated the functor to return an exhibit with a specified name (default provided)
- I want to spend some time thinking about the interface, and if we should expose intermediate results from this snippet.
JS
- Added support for more extensible conversions between JS types and full exhibits.
- Lots of good test cases here
Octave
- Updated the
Functor, quite pleased with how it turned out - I've added the module back here for the PR, will drop it out once the PR is done.
- Created a PR to track getting the upstream
javaoctavelibrary put into a central repository.
Renjin
Functorchanges are pending
Hive
- ~~Haven't tested the UDFs yet. I'd really like to add unit tests for Hive, can't find a good library for it.~~ #10
- Hive UDF: right now the implementation exposes the default frame as the result of the UDFs, i.e. same as before. We should migrate this to expose the resulting exhibit instead
- Hive UDF: Expose option to name resulting frames in the exhibit created
Server
- The last two Hive points also apply to the Server
- The
Functionclass in the Server can probably be removed by slightly changing theFunctorinterface
ETL
- I've put in a temporary hack (see MIGRATION_UTILITIES) to make the new Functor interface compatible with the earlier Calculator interface.
- I haven't spent enough time on this module. Need to see how to genericize the constructs here to fully use the new Functor interface.
Spark
- Applying a functor to
ExhibitRDDnow returns aExhibitRDD(instead ofDataFrame)
Misc pending items
- [Core] Pivot doesn't need to take a base function
- [Core] Add a chain-able functor construct
- [Core] Consider FrameFunctor of some kind:
Function<Frame,Frame>orFunction<Exhibit,Frame> - [JS] Don't use composite exhibit, convert to using Builder-pattern like Octave
- [Spark] Add vectors to ExhibitRDD
- ~~[All] File header cleanup (remove reference to creation time/author), add License headers~~
general thought-- these changes are a bit too massive for me to drink in all at once. Breaking them up into a series of commits-- core first, then avro/mongo/thrift, then server/etl/hive, would be really helpful.
Thanks for the suggestions! Made the fixes for the stuff you pointed out. I'll rebase & break up the commits in a more consumable order.
Re: the spacing, I'll try to be more aware of following the convention in the code as I add stuff. Do you know of a linter/mvn/intellij plugin which checks for consistency?