micropython-ulab sklearn functions?

Functions to cluster data on faster platforms would be something that I would guess, be useful to many people. This includes some techniques easily accessible though ulab, like pca through the linalg module methods, but others, like t-sne or UMAP remain very inaccessible (to my, limited, perception, anyway).

Would it be a suitable feature request to port some well performing functions from sklearn to ulab? Personally, I'd deem UMAP most useful.

Oct 24 '20 01:10 street-grease-coder

We have considerably re-worked ulab (https://github.com/v923z/micropython-ulab/tree/tensor), and I would like to release the new version in the next couple of days. That branch supports higher-dimensional tensors, proper views, broadcasting etc.

Once we are done with that, I would have time to consider new functions. In fact, one important feature of the new version is that it is configurable via a single header file, which means that it would be possible to implement an arbitrary number of functions, and if the whole package doesn't fit the flash, the irrelevant functions can easily be switched off. So, definitely, if you have suggestions, I am listening.

The only problem is that I have no experience with sklearn, and I don't even know, what is useful, and where I would begin. So, if you want that to be incorporated, you would have to give a bit more guidance. If you feel that you could contribute to the code, that would be even better. On this note, the only item blocking the release of the new version is that I haven't yet finished the programming manual. What I want to say with that is that in a couple of days the contribution threshold to ulab will be significantly lower.

Oct 24 '20 07:10 v923z

Amazing @progress, thanks so much, everyone who is involved!

Unfortunately I don't code a lot in python, mostly MATLAB. What is useful is one thing, but what can be run in reasonable speed, on a current micropython board is another of course.

This should be an open debate with real experts, but from data science point of view, UMAP (link to python code) is new and very popular, and would be very attractive if it could be run on a microcontroller, say, even in a loop, to analyze real-time (or close to real-time) data. This is the recent alternative to the (probably much more easily implementable (from scratch)) TSNE clustering.

It is conceivable to easily improve current IMU gesture recognition as demonstrated here with these algorithms!

Personal vote for this priority, with sklearn links:

PCA (this is simple to implement after covariance() and eigenvector() functions) and PCAangle
UMAP
and later: SVM (as demonstrated in the blogs), TSNE

Oct 26 '20 08:10 street-grease-coder

This should be an open debate with real experts, but from data science point of view, UMAP (link to python code) is new and very popular, and would be very attractive if it could be run on a microcontroller, say, even in a loop, to analyze real-time (or close to real-time) data. This is the recent alternative to the (probably much more easily implementable (from scratch)) TSNE clustering.

It is conceivable to easily improve current IMU gesture recognition as demonstrated here with these algorithms!

If I understood the post correctly, the training took place on a PC, with the help of sklearn, and only the trained model was compiled into the arduino firmware.

1. [PCA ](https://scikit-learn.org/stable/modules/generated/sklearn.decomposition.PCA.html)(this is simple to implement after covariance() and eigenvector() functions)

eigenvector is already supported, and covariance should not be hard to add.

2. [UMAP](https://umap-learn.readthedocs.io/en/latest/basic_usage.html)

This seems to be more involving, and I think, here we would have to tread a bit more carefully. But if you are familiar with the subject, you can, perhaps, generate some benchmarks on a PC. If analyzing a handful of data entries on a PC requires seconds, and MBs of RAM, then there is no point in trying to port the code. In fact, the first question is, what is the size of data sets that are reasonable on a microcontroller?

3. and later: [SVM ](https://scikit-learn.org/stable/modules/generated/sklearn.svm.SVC.html)(as demonstrated in the blogs), [TSNE](https://scikit-learn.org/stable/modules/generated/sklearn.manifold.TSNE.html)

Is this library not for data visualisation purposes? Also, the function has at least 30 arguments. Which are the most relevant?

Yours is an ambitious proposal, and I was wondering, whether we could point to 2-3 functions that could be a reasonable starting point. I am not sure I have the resources to implement something that can run only on the biggest and strongest microcontrollers. In fact, wouldn't a raspberry pi be a better platform for this?

Oct 26 '20 20:10 v923z

yes, a raspberry pi would be better suited for these tasks, however, it's limited by its size.

In terms of feasibility on the given platforms, I think if the algorithms can run on a ring buffer / continuous data stream of usual size (order of magnitude 1-5000 double values?), this would be good enough for e.g. IMU data.

PCA will be the easiest to start with (and I think, SVM). TSNE usually takes ram and time (this heavily depends on how big the data is, obviously, I'm guessing this doesn't really scale linearly but rather quadratically or worse), and AFAIK so does UMAP.

You correctly point out, the training/classifying of the pca and svm took place offline in the blog - which is why it would be so powerful to do this (previously hard to do/unheard of?). I'm sorry if the SVM link is bad, as I said I'm a bit of a noob in the python implementation of these algoritms, mea culpa

to answer: covariance and pca (maybe pca angle) would be a very intriguing and probably easy starting point, maybe others/experts have more input. I would predict/guess for PCA most users would be fine with extraction of the first 1-3 principal components.

edit: sorry for accidental close

Oct 26 '20 21:10 street-grease-coder

yes, a raspberry pi would be better suited for these tasks, however, it's limited by its size.

Do you mean it is too big (mechanically)? Actually, the raspberry pi zero is not significantly larger than an adafruit feather board. https://www.raspberrypi.org/products/raspberry-pi-zero/?resellerType=home You can also buy the compute module, which is smaller than the standard raspberry pi, but exposes more IO pins.

In terms of feasibility on the given platforms, I think if the algorithms can run on a ring buffer / continuous data stream of usual size (order of magnitude 1-5000 double values?), this would be good enough for e.g. IMU data.

As you have already seen in https://github.com/v923z/micropython-ulab/issues/179, constructing a circular buffer is not quite trivial. If you have an opinion on this, you could make it heard in that issue.

I have seen that someone was calculating 16384-point FFTs with ulab, and that requires 65536 floats (approx. 250 kB), so 5000 should definitely be doable.

PCA will be the easiest to start with (and I think, SVM). TSNE usually takes ram and time (this heavily depends on how big the data is, obviously, I'm guessing this doesn't really scale linearly but rather quadratically or worse), and AFAIK so does UMAP.

You correctly point out, the training/classifying of the pca and svm took place offline in the blog - which is why it would be so powerful to do this (previously hard to do/unheard of?).

The problem is, there might be a reason for "unheard of". Machine learning is not cheap (in terms of clock cycles), and might need dedicated hardware, like the TPU https://en.wikipedia.org/wiki/Tensor_Processing_Unit.

I'm sorry if the SVM link is bad, as I said I'm a bit of a noob in the python implementation of these algoritms, mea culpa

Don't worry, this is why we actually discuss issues. However, I am not an expert either...

to answer: covariance and pca (maybe pca angle) would be a very intriguing and probably easy starting point, maybe others/experts have more input. I would predict/guess for PCA most users would be fine with extraction of the first 1-3 principal components.

Is it correct that you only need the eigenvectors, and the covariance matrix for that?

edit: sorry for accidental close

No problem.

Oct 27 '20 17:10 v923z

yes, a raspberry pi would be better suited for these tasks, however, it's limited by its size.

Do you mean it is too big (mechanically)? Actually, the raspberry pi zero is not significantly larger than an adafruit feather board. https://www.raspberrypi.org/products/raspberry-pi-zero/?resellerType=home You can also buy the compute module, which is smaller than the standard raspberry pi, but exposes more IO pins.

You are totally right about the size of the boards - I was more thinking about the minimalistic size of the standard ESP32 chip itself - it's quite small to bring 2 cores, bluetooth and wifi to the table.

In terms of feasibility on the given platforms, I think if the algorithms can run on a ring buffer / continuous data stream of usual size (order of magnitude 1-5000 double values?), this would be good enough for e.g. IMU data.

As you have already seen in #179, constructing a circular buffer is not quite trivial. If you have an opinion on this, you could make it heard in that issue.

After reading more about this topic (here's a great intro explaining the importance), what is really needed is a queue element, that allows some form of enqueue and dequeue functionality (and optimally, accessing all of the values 'in-one-go'). Although this.) neatly explains that really, a queue is an implementation of a ringbuffer, I will comment there then.

I have seen that someone was calculating 16384-point FFTs with ulab, and that requires 65536 floats (approx. 250 kB), so 5000 should definitely be doable.

I think this will suffice, since users can essentially downsample, resample, smooth or otherwise compress the data to smaller sizes - although I'm almost certainly not considering all and neglecting other use-cases here.

PCA will be the easiest to start with (and I think, SVM). TSNE usually takes ram and time (this heavily depends on how big the data is, obviously, I'm guessing this doesn't really scale linearly but rather quadratically or worse), and AFAIK so does UMAP. You correctly point out, the training/classifying of the pca and svm took place offline in the blog - which is why it would be so powerful to do this (previously hard to do/unheard of?).

The problem is, there might be a reason for "unheard of". Machine learning is not cheap (in terms of clock cycles), and might need dedicated hardware, like the TPU https://en.wikipedia.org/wiki/Tensor_Processing_Unit.

Maybe - but for small data / queues / ringbuffers this may not apply, I think with two processing cores (in the case of ESP32) and a 240MHz clock speed, it stands to reason that it's worth a try! Obviously, classical machine learning training algorithms are not suited for microcontrollers. However, implementing the trained networks and possibly minimized networks is very feasible!

to answer: covariance and pca (maybe pca angle) would be a very intriguing and probably easy starting point, maybe others/experts have more input. I would predict/guess for PCA most users would be fine with extraction of the first 1-3 principal components.

Is it correct that you only need the eigenvectors, and the covariance matrix for that?

Yes, the PCA algorithm boils down to taking the eigenvectors from the covariance matrix, after subtracting the mean in every dimension. Here's a nice step-by-step description.

Oct 28 '20 09:10 street-grease-coder