Nick Pentreath
Nick Pentreath
Ah sorry about that - it was a rush to push it out in time for Spark Summit talk. Should be fixed in https://github.com/MLnick/glint-fm/commit/19de327879b8d2645b60436ac93afd8b16f62dec
By the way, the code here is really very rough and more a PoC than anything near production ready :)
Hi I'm afraid I'm not able to really actively maintain this at the moment. But any PR would be welcome :) Nick — Sent from Mailbox On Fri, Jun 27,...
Hmmm will take a look. Could be a memory leak or perhaps with large datasets something around merging all the intermediate HLL instances. Unfortunately Hive UDFs are really tricky to...
Something that might work is to try increasing the split size in Hadoop thus decreasing the effective number of mappers - this could potentially alleviate the intermediate merging pressure (in...
Hi Sorry for the long delay - been swamped and I no longer use Hive these days. I'll try to take a look if I get time. But it seems...
The basics for Spark-TF interop [here](https://github.com/tensorflow/ecosystem/blob/master/spark/spark-tensorflow-connector/README.md) may be helpful?
Indeed it's a problem that still has not been solved very well. I think TF-on-spark is arguably one of the better options (still leaves a lot to be desired though)...
Also : https://github.com/tensorflow/ecosystem/tree/master/spark/spark-tensorflow-distributor?
@dgoldenberg-audiomack I agree, the experience is far from seamless currently. The closest looking thing is actually: https://analytics-zoo.readthedocs.io/en/latest/doc/Orca/Overview/orca.html (from the BigDL team at Intel I think). I plan to give it...