elephant-bird icon indicating copy to clipboard operation
elephant-bird copied to clipboard

Twitter's collection of LZO and Protocol Buffer-related Hadoop, Pig, Hive, and HBase code.

Results 92 elephant-bird issues
Sort by recently updated
recently updated
newest added

I'm trying to use DelegateCombineFileInputFormat + LzoTextInputFormat + LzoTextOutputFormat. I'm also trying to specify the maxSplitSize for combining files. I've found that DelegateCombineFileInputFormat doesn't honor maxSplitSize, minSplitSizeNode, or minSplitSizeRack if...

Hi, I am facing below error though all pre-requisite are met including Thrift 0.7.0,protoc and jdk 1.7.55 version. [INFO] [INFO] Elephant Bird ...................................... SUCCESS [01:06 min] [INFO] Elephant Bird Hadoop...

[ERROR] Failed to execute goal on project elephant-bird-hadoop-compat: Could not resolve dependencies for project com.twitter .elephantbird:elephant-bird-hadoop-compat:jar:4.6-SNAPSHOT: Failure to find org.apache.hadoop:hadoop-client:jar:1.1.2 in http ://repo1.maven.org/maven/ was cached in the local repository, resolution...

This is a refinement of #398 We want to make sure that empty splits are not dropped, but more radically, we want better incorporation of locality. @gerashegalov, @sjlee, would love...

This is a refinement of #398 The implementation lifted from Pig is a bit ugly, and could be better tested. Better tests will allow us to do more radical refinement...

I want to use elephant-bird to write a twitter crawl to elasticsearch. How this can be done is described here: http://www.elasticsearch.org/guide/en/elasticsearch/hadoop/current/pig.html I don't need any parsing but I need valid...

Hi, I try to extract entries from a tfidf-SequenceFile which I created with seq2sparse. I can read and extract the content but I need to create a new SequenceFile with...

The default thrift deserializer is very lenient and ignores anything that does not quite make sense. The consumers almost always prefer Thrift deserializer to fail when a serialized record has...

There is a cast: https://github.com/kevinweil/elephant-bird/blob/master/core/src/main/java/com/twitter/elephantbird/mapred/output/DeprecatedOutputFormatWrapper.java#L100 That can fail: org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:170) Caused by: java.lang.ClassCastException: org.apache.hadoop.mapred.Reporter$1 cannot be cast to org.apache.hadoop.mapreduce.StatusReporter at com.twitter.elephantbird.mapred.output.DeprecatedOutputFormatWrapper$RecordWriterWrapper.(DeprecatedOutputFormatWrapper.java:98) at com.twitter.elephantbird.mapred.output.DeprecatedOutputFormatWrapper.getRecordWriter(DeprecatedOutputFormatWrapper.java:84) at cascading.tap.hadoop.io.TapOutputCollector.initialize(TapOutputCollector.java:102) at cascading.tap.hadoop.io.TapOutputCollector.(TapOutputCollector.java:79) at cascading.tap.hadoop.io.TapOutputCollector.(TapOutputCollector.java:68) at cascading.tap.hadoop.io.HadoopTupleEntrySchemeCollector.makeCollector(HadoopTupleEntrySchemeCollector.java:57)...