toolbox icon indicating copy to clipboard operation
toolbox copied to clipboard

Integration with ML Flink

Open rcabanasdepaz opened this issue 8 years ago • 5 comments

General info: https://cwiki.apache.org/confluence/display/FLINK/FlinkML%3A+Vision+and+Roadmap

Quick start: https://ci.apache.org/projects/flink/flink-docs-release-0.9/libs/ml/quickstart.html

Speech about flinkML: http://es.slideshare.net/TheodorosVasiloudis/flinkml-large-scale-machine-learning-with-apache-flink

rcabanasdepaz avatar Oct 20 '16 11:10 rcabanasdepaz

Hello Rafael, just ran into this issue :)

Could you give a few more details about your plans?

We are going to start development on a online learning library for Flink soon, so we are looking at our options for what to include in the library and we could also be looking to bring in some of the work that has been done as part of the AMIDST project.

thvasilo avatar Oct 25 '16 11:10 thvasilo

Hello Theodore, this issue is in a very initial phase of development. Our idea is to make possible to use any of the latent variable models provided by AMIDST with FlinkML data structures (e.g., DataSet[LabeledVector]). This functionality will be used from scala. To best of our knowledge, FlinkML cannot be used yet from Java, or at least the whole functionality.

Yet, our toolbox is already integrated with (standard) Flink by means of the module flinklink. With that, you are able to learn and to do inference of PGMs in a cluster environment. More details are given in the documentation of the web:

http://www.amidsttoolbox.com/documentation/0-6-0/examples-060/flinklink-060/

All the about this issue will be publish here. Alternatively, you can also be aware of the news about the toolbox by twitter: https://twitter.com/AmidstToolbox

rcabanasdepaz avatar Oct 25 '16 11:10 rcabanasdepaz

Cool, let me know if you need any help. If you think some of your work would make sense to be ported to FlinkML, we can talk about that as well. We still don't have a Naive Bayes model for example which I see is included here.

You are right that we don't support Java currently in FlinkML, unfortunately there are no plans to add it in the near future AFAIK.

I'll check out the rest of the toolbox, thanks for the info!

thvasilo avatar Oct 25 '16 12:10 thvasilo

The idea of porting some of the functionality in AMIDST to FlinkML sounds good. Do you have any documentation about how contributing to FlinkML? Clearly it would be interesting porting the Naive Bayes, but also some other classifiers much more powerful.

rcabanasdepaz avatar Oct 25 '16 12:10 rcabanasdepaz

Sure, our contribution guide is here, if somebody from your team is interested in porting AMIDST code to FlinkML, I'll be able to help them personally.

thvasilo avatar Oct 25 '16 12:10 thvasilo