onyx icon indicating copy to clipboard operation
onyx copied to clipboard

Integration with Tachyon?

Open kovasb opened this issue 10 years ago • 7 comments

I would be pretty interesting in some built-in support for this. http://tachyon-project.org/

kovasb avatar Nov 12 '15 21:11 kovasb

We'll take a look at this. Thanks for the idea! :)

MichaelDrogalis avatar Nov 12 '15 21:11 MichaelDrogalis

Cool. It seems to have huge momentum and many different kinds of use cases.

For instance u can put the rocksdb data on there, and have it transparently flush cold data to different storage tiers. Or move it around from machine to machine.

My use case is more for batch data processing, where workers can write data to tachyon and let it deal with moving from machine to machine (and persistent to permanent storage), with onyx orchestrating work unit distribution.

kovasb avatar Nov 12 '15 21:11 kovasb

It's going to be a while before we can look at it for that kind of use case (Feb-March), we have other things that are higher priority at the moment. I'll keep it in the back of my mind going forward though. It would be nice to get a plugin to read from Tachyon as a generic input/output stream. That could happen sooner.

MichaelDrogalis avatar Nov 12 '15 21:11 MichaelDrogalis

I'd read about tachyon a while back and definitely had it on my list of things to check again later. I'm definitely interested though, as Michael says, it may take some time.

lbradstreet avatar Nov 13 '15 03:11 lbradstreet

Just adding some ammunition here - I built a very fast and very successful distributed computational pipeline that heavily used Tachyon. I think to get the most out of Tachyon might involve some rearchitecting of Onyx.

ohpauleez avatar Jan 05 '16 20:01 ohpauleez

Thanks @ohpauleez. We're unlikely to make a major architectural pivot as the streaming engine is performing well (and is a large investment), so we appreciate the data point.

MichaelDrogalis avatar Jan 05 '16 20:01 MichaelDrogalis

We could probably do something similar to Flink and provide a tachyon input and output plugin, and or useful lifecycle calls that would allow peers to load data from tachyon as part of the usual task lifecycle.

With our new upcoming scheduler we could probably get even greater improvements ensuring we get some nice data locality properties by scheduling tasks requiring that data near where the data is stored in tachyon.

This is still not a priority for us, and we haven't seen the demand yet, but if anyone is interested enough I'd be happy to devote my time assisting with any questions and help where I can.

lbradstreet avatar Jan 05 '16 22:01 lbradstreet