dfs-datastores icon indicating copy to clipboard operation
dfs-datastores copied to clipboard

Dead-simple vertical partitioning, compression, appends, and consolidation of data on a distributed filesystem.

Results 28 dfs-datastores issues
Sort by recently updated
recently updated
newest added

I updated the dependencies my project.clj file, ran `lein deps`. I checked that it's in my `~/.m2/repository/` folder. I tried requiring the project in the repl in multiple ways `(require...

I'd like to add parquet support in addition to sequence files.

A simple test file: ``` public class PailExampleTest { @SuppressWarnings({ "unchecked", "rawtypes" }) public static void main(String[] args) throws IOException { Pail pail = Pail.create("mypail"); TypedRecordOutputStream os = pail.openWrite(); os.writeObject(new...

I'm reading "Big Data" very interested and are trying to implement the batch layer with graph data model as master dataset. I always get a NullPointerException when I call the...

I am having a problem in Lambda Architecture, Our data stored in HDFS is in fact based pail format using Thrift Serialization schemes and vertical partitioning. Is there any direct...

Here is some code to reproduce the issue ``` sh $ cd dfs-datastores/dfs-datastores/ $ lein uberjar Compiling 53 source files to /Users/bhiles/src/dfs-datastores/dfs-datastores/target/classes warning: [options] bootstrap class path not set in...

The VersionedTap's sourceConfInit method overrides mapreduce.input.fileinputformat.inputdir in: https://github.com/nathanmarz/dfs-datastores/blob/master/dfs-datastores-cascading/src/main/java/com/backtype/cascading/tap/VersionedTap.java#L96 As a result, we can't define out of the box a cascading.tap.MultiTap as a list of VersionedTap as each VersionedTap is going...

I'm working on an AvroFileFormat, and I'd like to be able to pass it args -- this seemed like the least bad way to make it happen. Open to suggestions...

Change the SequenceFilePailInputFormat to use the CombineFileInputFormat. This should reduce the number of input splits for Pail sources. In my tests, several thousand splits were reduced to one. There is...

These are a couple of enhancements to the Consolidator. 1. At the end of the consolidator mapper I added a check on the delete of the source file and throw...