clumper icon indicating copy to clipboard operation
clumper copied to clipboard

Split Clumper class by functionality

Open samarpan-rai opened this issue 3 years ago • 2 comments

Is your feature request related to a problem? Please describe. Our main class is becoming a monolith. Currently Clumper class is over 1500 lines. The major contributor is the documentation but it still makes it difficult to navigate through while developing.

Describe the solution you'd like Split the class into multiple methods and/or classes. I think we already have a good structure based on tests by functionality. For example, we can split into following system by functionality

  • Read/writing
  • Verbs
  • (other?)

samarpan-rai avatar Apr 02 '21 21:04 samarpan-rai

Normally, you would consider subclassing because parts can be re-used. Scikit-learn has a ClassifierMixin so that all classifiers are consistent for example.

The thing with this package though ... it's really just a single class. Splitting it up feels like it would add complexity instead of removing it. There's something to be said to perhaps group methods together. But I don't know of a clean way to split up a single class across multiple files.

koaning avatar Apr 03 '21 07:04 koaning

Yes, I can see that sub-classing may not be ideal but what if we split the methods based on their functionality. I am not too sure about about Mixins yet.

However, we can split the reading functions in a different file such that we can call them without instantiating the class. I think this also touches on issue #81. This is what pandas does when it loads as seen in their dunder init.py.

import clumper

clump = clumper.read_csv(...) # holds for other read functions

Just moving the read functions can remove 300+ lines from the Clumper class.

samarpan-rai avatar Apr 04 '21 15:04 samarpan-rai