smartcore
smartcore copied to clipboard
First draft of the new n-dimensional arrays + NB use case
This PR will eventually solve #85. The goal is to have an abstract layer that can be implemented by concrete library that enables all traditional functionality provided by n-dimensional arrays: array manipulation, transformation, basic linear algebra routines, views e.t.c.
The abstract layer I am proposing here enables us to:
- have arrays that can hold multiple types, including strings
- implement more efficient ML algorithms that minimize memory copy operations
- have ML algorithms that declare more granular requirements to input and output types, e.g. we can declare an input type for an algorithm that does need a simple non-mutable array w/o typical linear algebra routines.
As a bonus, this PR introduces a row-major dense matrix that might get handy in some cases.
@morenol there are multiple problems with this PR, like println
s, lack of documentation, e.t.c. and I plan to fix these within next 3-4 of weeks. But I thought I'd share this PR with you so that you can take a look on proposed changes earlier. Besides, I've decided to try new abstract layer on NB first.
Codecov Report
Merging #108 (e99bd70) into v0.5-wip (e99bd70) will not change coverage. The diff coverage is
n/a
.
:exclamation: Current head e99bd70 differs from pull request most recent head 55d2c16. Consider uploading reports for the commit 55d2c16 to get more accurate results
@@ Coverage Diff @@
## v0.5-wip #108 +/- ##
=========================================
Coverage 38.72% 38.72%
=========================================
Files 86 86
Lines 7031 7031
=========================================
Hits 2723 2723
Misses 4308 4308
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.
@VolodymyrOrlov this looks nice.
Since we are rethinking the way that the arrays are represented I think that we should consider the usage of const generics.
Any concrete ideas how we can use const generics to improve the proposed abstract layer? For example, const generics can be used to define number of dimensions in S
I think that I edited your comment instead of a response: here it is
Any concrete ideas how we can use const generics to improve the proposed abstract layer? For example, const generics can be used to define number of dimensions in S
I am not sure how we can use them in the abstract layer. I was thinking in the structs defined in linalg::dense::. We can create an special type for static sized arrays. But maybe it is better to keep it simple and use nalgebra and ndarray when we want that kind of optimization
Great work Volodymyr! This looks really promising. Just few notes to check if I understood the new abstraction and to help devs in digging:
- we are going to have new traits groups to define types for array/vectors elements:
Num
/Number
andFloatNumber
(I like it!) directly usingnum_traits
; - A lot of things going on in
linalg/base.rs
:- new smartcore
Array
-related type declarations to provide a unified interface, including the ones from optimised strides read/write. - this interface is the toolbox to be leveraged by higher-level types, in particular the new implementations in
linalg/dense/
;
- new smartcore
- we have now "views" types that, I assume, are to provide segmentation for the unerlying structs so to allow reusing of the same instantiated struct and avoid copying;
- supercool the "Decomposable"s for structs decomposition.
Will help with documentation as soon as it is relatively stable.
@steboss check it out!
Great work Volodymyr! This looks really promising. Just few notes to check if I understood the new abstraction and to help devs in digging:
we are going to have new traits groups to define types for array/vectors elements:
Num
/Number
andFloatNumber
(I like it!) directly usingnum_traits
;A lot of things going on in
linalg/base.rs
:
- new smartcore
Array
-related type declarations to provide a unified interface, including the ones from optimised strides read/write.- this interface is the toolbox to be leveraged by higher-level types, in particular the new implementations in
linalg/dense/
;we have now "views" types that, I assume, are to provide segmentation for the unerlying structs so to allow reusing of the same instantiated struct and avoid copying;
supercool the "Decomposable"s for structs decomposition.
Will help with documentation as soon as it is relatively stable.
@Steboss check it out!
That's right! The goal is to build a rich interface for 1- and 2-dimensional arrays that can be extended by anyone later. On one hand we can make ML algorithms more efficient by relying on a concrete library, like nalgebra or ndarray. Investing time into an abstract layer that facilitates integration with 3rd partly libraries, on the other hand, gives us a lot of flexibility and saves time later, if we find it necessary to bring in some other library in the future.
@morenol please take a look, there are still missing implementations but the code is at least cleaned up.
- conflicts with
development
should be now solved - run the linter, try to compile
- there are some types that are not correct, in particular in
src/linear/ridge_regression.rs
andsrc/linear/logistic_regression.rs
- tests need to be double-checked against the new typed structures and implementations; possibly write new ones
- there is a problem with distance metrics
- ... (probably hundreds of things more) 💯
please fetch this branch and run cargo build --lib
, maybe there are some errors for which the solution is evident to you people.
please fetch this branch and run
cargo build --lib
, maybe there are some errors for which the solution is evident to you people.
I can take a look later today
sure, when you have time. If we can fix the static analyzer (in src/ensemble
) and compiler errors maybe we can move into running tests 👍🏼
Also I don't understand how the implementation for RandomForestClassifier.samples
should work as in this branch the property has been removed.
@montanalow please take a look to ensemble
and linear
modules as you are more familiar with those procedures. The objective is to make things to compile and then move into making the tests to pass. Also I saw some conflicts when pulling the very latest changes.
great job @morenol
Can you tell me why we are missing definitions for ElasticNetSearchParameters
, LassoSearchParameters
, LinearRegressionSearchParameters
? that instead are present in development
?
I will try a merge, another hundred conflicts to solve
ElasticNetSearchParameters
not completely sure, we probably need to split in several PRs once we have all the conflicts solved in order to make it easy to review
I've got most of the merge conflicts sorted in my personal branch, but we need to apply many of the updates in this branch to the new search params, and apply the rng changes to this branch. I think I understand it enough to get it done in an hour or so.
@montanalow great! I was splitting the changes into multiple branches but if you can do in one pass that would be awesome.
@montanalow great! I was splitting the changes into multiple branches but if you can do in one pass that would be awesome.
#171 is up that should resolve all merge conflicts in this branch, although it's a fairly large change set...
The library now compiles but there are still a lot of runtime errors and tests to fix.
I have not yet a clear understanding about what it is going to be the substitute for DenseMatrix
, I thought it should have been Array2
but how to instantiate it? For example this test. Also predict_oob
is still broken as I don't know how to implement it without using the samples
property that has been removed (probably is supposed to use arrays views?).
I am not still sure we can merge this all at once, I will probably start opening a PR to merge num.rs
so we can do it gradually.
I am seeing discrepancies between the metrics::recall
module in development and the one in this branch. Things are getting too entangled, please move and follow up to #173 for a more stepwise implementation.
please help. it is quite easy, just follow the implementations in the other ported modules. cc: @morenol @montanalow @titoeb
Missing modules,
-
preprocessing
-
readers
-
svm
All work moved to https://github.com/smartcorelib/smartcore/tree/v0.5-wip