SynapseML icon indicating copy to clipboard operation
SynapseML copied to clipboard

Feature: Autogen Frequent Pattern Matching (and other SparkML models) for .NET for Apache Spark project

Open rrekapalli opened this issue 2 years ago • 7 comments

Team, Is there a plan to implement full Spark's MLLib, especially ML.fpm (Frequent Pattern Mining) anytime soon? It has only 2 algorithms as per Spark v3.1.2 (FP-Growth & PrefixSpan) and are very useful ML algorithms in some scenarios. Would be a great help if it's a part of this library. Thanks.

AB#1946967

rrekapalli avatar Aug 25 '22 05:08 rrekapalli

Hey @rrekapalli :wave:! Thank you so much for reporting the issue/feature request :rotating_light:. Someone from SynapseML Team will be looking to triage this issue soon. We appreciate your patience.

github-actions[bot] avatar Aug 25 '22 05:08 github-actions[bot]

Hey @rrekapalli thanks for reaching out, not sure what the request is here but SparkML seems to already support these algorithms: https://spark.apache.org/docs/latest/ml-frequent-pattern-mining.html#:~:text=Mining%20frequent%20items%2C%20itemsets%2C%20subsequences,rule%20learning%20for%20more%20information. And our library is completely inter-operable with SparkML so feel free to mix this into your SynapseML models and pipelines

mhamilton723 avatar Aug 26 '22 00:08 mhamilton723

Hi @mhamilton723 , thank you for the quick response!
I believe SynapseML depends on .Net for Apache Spark, which does not have full implementation of SparkML. Thought SynpseML would have full interoperability (especifically, C# bindings) with SparkML, but I could not find any references to the "org.apache.spark.ml.fpm" in this repo. Appreciate if you could point me to a reference about this feature.

rrekapalli avatar Aug 26 '22 05:08 rrekapalli

Ahhhh yes i get what you are saying now. @serena-ruan will eventually contribute back the generated SparkML bindings to the Spark.NET team. I will let her comment on timelines and whatnot

mhamilton723 avatar Aug 30 '22 16:08 mhamilton723

Thank you, @mhamilton723 !

rrekapalli avatar Aug 31 '22 05:08 rrekapalli

@rrekapalli Thanks for raising up this feature request! Currently as you can see, not all SparkML models are supported in .Net for Apache Spark. But this FPM looks like a typical one we could solve by applying our codegen bindings. Though I can't give you a precise ETA at this moment, because I think even after we contribute to dotnet/spark repo, we need to wait until Microsoft.Spark cut a newer release in order to use the feature officially. But I'll have a try within this week or early next week, and keep you updated :D

serena-ruan avatar Aug 31 '22 15:08 serena-ruan

Thank you very much for taking this up, @serena-ruan! Would be eagerly waiting this to be part of this Repo. Really appreciate your effort!

rrekapalli avatar Aug 31 '22 15:08 rrekapalli