cascalog icon indicating copy to clipboard operation
cascalog copied to clipboard

Problem in directly querying data in pail structure

Open pankug opened this issue 9 years ago • 1 comments

I am having a problem in Lambda Architecture, Our data stored in HDFS is in fact based pail format using Thrift Serialization schemes and vertical partitioning.

Is there any direct way we can query our data (residing in HDFS) in batch view, so we don't have to store our output data in ElephantDB or another database and we can directly view our data and stored it in a readable format.

We are facing this problem If u can help us regarding code structure and other techniques using any set of Big Data tool and language

pankug avatar Jan 14 '16 06:01 pankug

Directly query, as in do random access lookups on a key inside of HDFS? Not that I know of.

Sam Ritchie (@sritchie https://twitter.com/sritchie) Machine Learning @ Stripe https://stripe.com/ samritchie.io http://www.samritchie.io/ | 703.863.8561

On Jan 13, 2016, at 11:30 PM, Pankaj Joshi [email protected] wrote:

I am having a problem in Lambda Architecture, Our data stored in HDFS is in fact based pail format using Thrift Serialization schemes and vertical partitioning

Is there any direct way we can query our data (residing in HDFS) in batch view, so we don't have to store our output data in ElephantDB or another database and we can directly view our data and stored it in a readable format

We are facing this problem If u can help us regarding code structure and other techniques using any set of Big Data tool and language

— Reply to this email directly or view it on GitHub https://github.com/nathanmarz/cascalog/issues/299.

sritchie avatar Jan 14 '16 18:01 sritchie