elephant-bird
elephant-bird copied to clipboard
Twitter's collection of LZO and Protocol Buffer-related Hadoop, Pig, Hive, and HBase code.
I'm working on support for nested structures.
$ git clone git://github.com/a-b/elephant-bird.git Cloning into elephant-bird... remote: Counting objects: 11298, done. remote: Compressing objects: 100% (2235/2235), done. remote: Total 11298 (delta 8165), reused 10893 (delta 7875) Receiving objects: 100%...
Hi! I've just found an issue using JsonStringToMap on PIG 0.8.1. The schema `"json: [chararray]"` throws a `ParseException`. The following works on PIG 0.8.1: ``` return Utils.getSchemaFromString("json: []", DataType.CHARARRAY); ```...
JSON is currently handled a little weird. Pig can read json from any file input format, however, map reduce jobs can only read json from lzo files. Additionally, parsing is...
data = LOAD 'hdfs://localhost//foo/23,hdfs://localhost/foo/24' USING com.twitter.elephantbird.pig.load.SequenceFileLoader () produces: ## Backend error message during job submission org.apache.pig.backend.executionengine.ExecException: ERROR 2118: java.net.URISyntaxException: Illegal character in scheme name at index 0: 23,hdfs: at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:280)...
With Raghu's refactoring, we can set the inputFormat and outputFormat as follows: ``` job.setInputFormatClass( LzoProtobufB64LineInputFormat.getInputFormatClass(MyProtobufClass.class, conf) ); ``` We need to do the same for Writables, as this: job.setOutputValueClass(ThriftWritable.class); doesn't...
Bumps [protobuf-java](https://github.com/protocolbuffers/protobuf) from 2.4.1 to 3.16.3. Release notes Sourced from protobuf-java's releases. Protobuf Release v3.16.3 Java Refactoring java full runtime to reuse sub-message builders and prepare to migrate parsing logic...
This PR was automatically created by Snyk using the credentials of a real user.Snyk has created this PR to upgrade joda-time:joda-time from 1.6 to 1.6.2. :information_source: Keep your dependencies up-to-date....
Snyk has created this PR to upgrade org.apache.crunch:crunch-core from 0.8.2 to 0.15.0. :information_source: Keep your dependencies up-to-date. This makes it easier to fix existing vulnerabilities and to more quickly identify...
Snyk has created this PR to upgrade org.apache.hadoop:hadoop-client from 1.1.2 to 1.2.1. :information_source: Keep your dependencies up-to-date. This makes it easier to fix existing vulnerabilities and to more quickly identify...