daru icon indicating copy to clipboard operation
daru copied to clipboard

Follow-up of GSoC: remove obsolete parts from main gem

Open zverok opened this issue 6 years ago • 23 comments

  1. I believe after release of daru-io there should be a period of cleaning up daru of duplicating things (importers and exporters), and related outdated stuff, like mentioned here: https://github.com/SciRuby/daru/issues/404
  2. I believe daru-io should be hard dependency for daru (like rspec installs rspec-core, rspec-expectations and so on) -- maybe in future with extraction of daru-dataframe-only gem, maybe not. But the point is, if somebody just install daru and wants his DF from CSV, they don't need to think about daru-io explicitly.
  3. Currently I am not sure about parts of daru that can be removed due to daru-views existance, subject to investigate.
  4. I believe such API change is marking next major version, and probably several other things should be done before the release, like https://github.com/SciRuby/daru/issues/336, and probably https://github.com/SciRuby/daru/issues/317 and https://github.com/SciRuby/daru/issues/318
  5. And by this time we probably can do a Great Cleanup, like removing completely everything that was deprecated in last versions, and deprecate a bunch of another stuff, and process #cleanup-tagged issues, and merge some stuff like https://github.com/SciRuby/daru/pull/340 and ...

I am ready to lead the work (and do the most complicated parts myself), and propose the following (really hi-level) timeline:

  • September: finalizing GSoC, planning new release and define roadmap :thinking: :thought_balloon:
  • October: hard work :rage1: :rage2: :rage3: :rage4:
  • Beginning of November, hopefully (the earlier, the better, maybe I am overestimating the complexity of the road): Big Shiny Release :tada:

@v0dro @lokeshh @athityakumar @Shekharrajak @parthm @rainchen please feel free to share your thoughts, and notify if you are willing to take some part of the work.

zverok avatar Sep 01 '17 21:09 zverok

Probably, the work should be done on some daru-1.0 branch, and (as GitHub allows it) PRs on the road should be merged there, not in master?.. Or master would became our moving target, and stable branch should be maintained for current bugs fixing/small enchancements.

zverok avatar Sep 01 '17 21:09 zverok

Since it's usually an understanding that master (or default branch) is usually the stable branch, I feel that it'd be better if we continue on daru-1.0 branch. Regarding the port of daru-io, I'd definitely like to contribute in removing obsolete IO parts from daru. A few pointers related to this porting :

  • IMO, #save and #load methods are better suited to reside within daru.
  • All other IO methods have already been ported to daru-io, and can be removed from daru.
  • Though #to_html seems like an Exporter method at first look, I feel that it'd be better to have it in daru-view. Of course, it can be ported to daru-io too if that'd be better.

athityakumar avatar Sep 01 '17 21:09 athityakumar

Since it's usually an understanding that master (or default branch) is usually the stable branch, I feel that it'd be better if we continue on daru-1.0 branch.

:+1:

Regarding the port of daru-io, I'd definitely like to contribute in removing obsolete IO parts from daru.

:+1:

IMO, #save and #load methods are better suited to reside within daru.

:+1:

All other IO methods have already been ported to daru-io, and can be removed from daru.

:+1:

Though #to_html seems like an Exporter method at first look, I feel that it'd be better to have it in daru-view.

:+1:

Looks like we are on the same page currently :)

zverok avatar Sep 01 '17 22:09 zverok

daru uses Nyaplot, Gruff plotting libraries. daru-view uses Nyaplot (same as daru but it is able to generate html code) , HighCharts, GoogleCharts for plotting.

So if we want to shift the whole plotting libraries to daru-view then we need to shift the Gruff library to daru-view (without any change) and need some changes in daru-view/nyaplot module.

I will try to do this part of work.

Shekharrajak avatar Sep 02 '17 15:09 Shekharrajak

@zverok looks like a good plan. I would be glad to help on some pieces as time permits.

Since this is toward v1.0, it may be worth reviewing Daru API for consistency. As 1.0 is a major version I would think it would be significantly more difficult to make any API adjustments post 1.0. Two examples that come to mind are the discussion on to/from-read/write on issue #280 and the inconsistency between DataFrame#vector_sum and Vector#sum in issue #391 (the latter ignores nil by default, the former doesn't). With Daru usage growing, it may be worth using the major version change to iron out any API inconsistencies.

A somewhat related question would be, does Daru have a deprecation policy and versioning guideline in place to help with breaking changes? This can help users with migration to newer versions of Daru and becomes important due to its growing adoption.

parthm avatar Sep 03 '17 13:09 parthm

Just too keep track of another API proposal discussed, issue #131 talks about changing #dv to #to_daru_vector.

parthm avatar Sep 03 '17 13:09 parthm

I would be glad to help on some pieces as time permits.

Nice to hear that!

Since this is toward v1.0, it may be worth reviewing Daru API for consistency.

Yes, it would definitely a huge review. As @v0dro officially made me the maintainer, I am planning a Big Autumn Cleanup for a huge part of our ancestry :trollface: Hold your hats.

A somewhat related question would be, does Daru have a deprecation policy and versioning guideline in place to help with breaking changes? This can help users with migration to newer versions of Daru and becomes important due to its growing adoption.

We have deprecate API (which marks methods as deprecated and prints warnings), but to that point never removed anything deprecated. Probably, it is a good point to explicitly set the policy (and even add some automation tool for check it, like "should ve drop this or that methods in current version?")

zverok avatar Sep 03 '17 17:09 zverok

Just too keep track of another API proposal discussed, issue #131 talks about changing #dv to #to_daru_vector.

To be honest, current preferred by community (outside Rails one, ofc) Ruby style is "avoid monkey patches of core classes if only possible". I believe we'll follow this rule in 1.0

zverok avatar Sep 03 '17 19:09 zverok

@baarkerlounger, since you're working on release cycles, will it possible for you to take inspiration from your previous work and other gems and frame a 'release policy' for daru which we will consistetly follow for every single new release in the future? I think it is very important given that this is becoming a popular project and more people have started contributing to it. Please share your thoughts on this issue: https://github.com/SciRuby/daru/issues/411

v0dro avatar Sep 05 '17 13:09 v0dro

Also, I think the next version (after the great cleanup) should be 0.2 not 1.0. We'll release 1.0 once the data storage issue is sorted out: https://github.com/SciRuby/daru/issues/328

v0dro avatar Sep 05 '17 13:09 v0dro

Also, I think the next version (after the great cleanup) should be 0.2 not 1.0. We'll release 1.0 once the data storage issue is sorted out: #328

Well, you are the boss, but I believe with current usage, number of GitHub stars, and so on and so forth -- the new version with seriously updated docs, removed deprecations, new importers/exporters, ... well, I really believe it could be called The One-Point-Oh.

As about #328 -- I am not sure that it should become the main implementation of Daru. Because Daru is useful also as a small, easy-to-install, almost-no-dependencies concept of DataFrame and Vector, and with daru-io and daru-view even more so (like "take those 300 rows from DB and export them to Excel").

I understand the goal to compete with pandas, yet, to be honest, I still believe all internal speedups with dependencies on side libraries and compiled codes should be only plugins, not the core of the library.

Of course, you may have another opinion, and, like I've said, you are the boss.

zverok avatar Sep 05 '17 15:09 zverok

I guess the outcome of this version numbering should be part of the 'release policy' - personally I like have x.y.z with x number bumps being used only for breaking changes that will require user code to change, y being bumped for new features, z being used for bug fixes only.

That way users can have consistent expectations about behaviour from the numbering. So I think 0.2.0 or 1.0.0 should depend on whether the great cleanup is backwards compatible or not.

baarkerlounger avatar Sep 05 '17 15:09 baarkerlounger

Hmmmm yes @baarkerlounger is right. @zverok I don't think there will be too many breaking changes due to the cleanup given that we're essentially only moving out the IO and view methods, right?

About the data storage, I think it will probably be a plugin which will over-ride all core daru functionality once required.

v0dro avatar Sep 05 '17 17:09 v0dro

I also think there should be another release when the currently open PRs are merged.

baarkerlounger avatar Sep 05 '17 17:09 baarkerlounger

Based on autumn cleanup review @zverok mentoned in an earlier note, if we have API consistency being handled in this release, labeling it 1.0 might make sense. In case it's only internal cleanup I suppose we could call it 0.2.

I still believe all internal speedups with dependencies on side libraries and compiled codes should be only plugins, not the core of the library.

Just to cast my vote here, this approach makes sense IMHO. It makes it very easy to get started with Daru. Having a huge install by default can sometimes be painful. I had issues earlier when I tried the full install of SciRuby (IIRC it was some compilation issues with gsl) and nearly ended up going with Pandas. Fortunately, a Daru only install worked smoothly and I could happily use Ruby/Daru.

parthm avatar Sep 05 '17 17:09 parthm

@zverok I don't think there will be too many breaking changes due to the cleanup given that we're essentially only moving out the IO and view methods, right?

Well, my idea of the cleanup (described in original issue) is:

  • move out IO/view
  • it already will introduce some changes and most probably incompatibilities, therefore...
  • remove what was deprecated for a long time
  • go through all unmerged PRs, unfinished issues and style and fix it

So, I've meant that splitting out daru-io is a BIG move, and it could be a reason for dropping the outdated things, and doing a New Big Shiny release.

zverok avatar Sep 06 '17 06:09 zverok

Yes I agree. @zverok do you plan to work on this?

v0dro avatar Oct 08 '17 03:10 v0dro

Right now sitting and looking into the large file with a plan :) Will publish today or tomorrow's morning!

zverok avatar Oct 08 '17 13:10 zverok

I've created branch v-1-pre targeting all "big" (e.g. incompatible) changes and GitHub issues milestone. The latter is NOT complete currently (I am still reviewing through all we have).

As for myself, I am planning currently to work on updating specs (and making a list of methods to deprecate/redesign), and if evrybody is OK with it I'll work just in the branch, without PRs.

zverok avatar Oct 08 '17 22:10 zverok

@zverok why no PRs? Will end up with a potentially huge branch to review?

baarkerlounger avatar Oct 08 '17 22:10 baarkerlounger

@baarkerlounger because I am planning mostly to do "maintenance" tasks (change minimal Ruby version, facelift code style here and there and so on), and also gradually update specs between that, it would be work-in-progress of preparation of Big Release. It would be iterations like "look at Index, fix specs, discuss possible deprecations, fix a bit more specs, cleanup some method" and so on.

(And one huge PR "fix all specs" would be almost un-reviewable anyways.)

zverok avatar Oct 08 '17 22:10 zverok

I've submitted PR #430 to port IO modules away from daru, please review. (I'll resolve the merge conflict along with review changes) 😄

athityakumar avatar Oct 13 '17 10:10 athityakumar

JFYI: not sleeping. Started to refactor specs & docs, which somehow leaded to HUGE lib review, current results (WIP, not passing specs at the moment) in https://github.com/SciRuby/daru/tree/awfully-big-refactoring branch. Current plan is to finish it there, then merge into v-1-pre branch, then fix everything that will left of milestone :) ETA ~1-2 weeks on my side.

zverok avatar Nov 11 '17 12:11 zverok