exhibit icon indicating copy to clipboard operation
exhibit copied to clipboard

Update Calculator Interface

Open prateek opened this issue 10 years ago • 3 comments

The current Calculator interface enforces Function<Exhibit, Iterable<Obs>>, update this to Function<Exhibit, Exhibit>. Also, rename Calculator to Functor.

prateek avatar Aug 10 '15 07:08 prateek

Attached code for an initial pass at this. Here are the highlights broken down per module:

Core

  • Migrated some interfaces to abstract base class for code re-use(hashCode, equals)
  • I'm using a hard-coded array of primes for some hashCode implementations(e.g.: here), not sure if there's a better way.

Avro/MongoDB/Thrift

  • Mainly just updates due the interface changes in Core

SQL

  • Updated the functor to return an exhibit with a specified name (default provided)
  • I want to spend some time thinking about the interface, and if we should expose intermediate results from this snippet.

JS

  • Added support for more extensible conversions between JS types and full exhibits.
  • Lots of good test cases here

Octave

  • Updated the Functor, quite pleased with how it turned out
  • I've added the module back here for the PR, will drop it out once the PR is done.
  • Created a PR to track getting the upstream javaoctave library put into a central repository.

Renjin

  • Functor changes are pending

Hive

  • ~~Haven't tested the UDFs yet. I'd really like to add unit tests for Hive, can't find a good library for it.~~ #10
  • Hive UDF: right now the implementation exposes the default frame as the result of the UDFs, i.e. same as before. We should migrate this to expose the resulting exhibit instead
  • Hive UDF: Expose option to name resulting frames in the exhibit created

Server

  • The last two Hive points also apply to the Server
  • The Function class in the Server can probably be removed by slightly changing the Functor interface

ETL

  • I've put in a temporary hack (see MIGRATION_UTILITIES) to make the new Functor interface compatible with the earlier Calculator interface.
  • I haven't spent enough time on this module. Need to see how to genericize the constructs here to fully use the new Functor interface.

Spark

  • Applying a functor to ExhibitRDD now returns a ExhibitRDD (instead of DataFrame)

Misc pending items

  • [Core] Pivot doesn't need to take a base function
  • [Core] Add a chain-able functor construct
  • [Core] Consider FrameFunctor of some kind: Function<Frame,Frame> or Function<Exhibit,Frame>
  • [JS] Don't use composite exhibit, convert to using Builder-pattern like Octave
  • [Spark] Add vectors to ExhibitRDD
  • ~~[All] File header cleanup (remove reference to creation time/author), add License headers~~

prateek avatar Aug 10 '15 07:08 prateek

general thought-- these changes are a bit too massive for me to drink in all at once. Breaking them up into a series of commits-- core first, then avro/mongo/thrift, then server/etl/hive, would be really helpful.

jwills avatar Aug 11 '15 04:08 jwills

Thanks for the suggestions! Made the fixes for the stuff you pointed out. I'll rebase & break up the commits in a more consumable order.

Re: the spacing, I'll try to be more aware of following the convention in the code as I add stuff. Do you know of a linter/mvn/intellij plugin which checks for consistency?

prateek avatar Aug 11 '15 18:08 prateek