dfs-datastores icon indicating copy to clipboard operation
dfs-datastores copied to clipboard

Dead-simple vertical partitioning, compression, appends, and consolidation of data on a distributed filesystem.

Results 28 dfs-datastores issues
Sort by recently updated
recently updated
newest added

TypedRecordOutputStream.writeObject(obj); should check type of object being written to Pail. Pail.toString() is also overriden to print Pail info more nicely (not just for this issue of course). Result is following...

When I create a new versioned tap (using VersionedKeyValSource from Scalding) I get an NPE: Caused by: java.lang.NullPointerException at org.apache.hadoop.mapred.FileInputFormat.getPathStrings(FileInputFormat.java:342) at org.apache.hadoop.mapred.FileInputFormat.setInputPaths(FileInputFormat.java:288) at com.backtype.cascading.tap.VersionedTap.sourceConfInit(VersionedTap.java:88) at com.backtype.cascading.tap.VersionedTap.sourceConfInit(VersionedTap.java:19) at cascading.flow.hadoop.HadoopFlowStep.initFromSources(HadoopFlowStep.java:332) at cascading.flow.hadoop.HadoopFlowStep.getInitializedConfig(HadoopFlowStep.java:99)...

I have a slightly modified implementation of PailStructure where i store some state (say myvar) in the Implemented Object which is used for Ser/De. So my code looks something like...

Right now, a PailRecordWriter can open an unlimited number of files. Instead of using a HashMap to contain the mapping of attributes to open files, PailRecordWriter should use a LinkedHashMap...

@johnynek, moved your issue over to here. Exception in thread "flow Tutorial6" java.lang.IllegalArgumentException: java.net.URISyntaxException: Relative path in absolute URI: manhattansink:kv.test:LATEST at org.apache.hadoop.fs.Path.initialize(Path.java:148) at org.apache.hadoop.fs.Path.(Path.java:126) at cascading.tap.hadoop.util.Hadoop18TapUtil.cleanupTapMetaData(Hadoop18TapUtil.java:185) at cascading.flow.hadoop.HadoopFlowStep.cleanTapMetaData(HadoopFlowStep.java:272) at cascading.flow.hadoop.HadoopFlowStep.clean(HadoopFlowStep.java:257)...

The Pail.create() methods allow a FileSystem to be used in the creation of the pail. However, when creating a snapshot of a pail the only option is to include the...

Since "/tmp/filecopy" is hardcoded in FileCopyInputFormat as tmproot if a snapshot is run and the user running the process isn't able to write to "/tmp" or "/tmp/filecopy" the snapshot will...

When PailTap is used, sink taps do not use randomized names as expected. Instead, they use the default hadoop naming scheme. expected: ``` target/randomized.pailfile ``` actual: ``` target/part-00000.pailfile ``` I've...

Usable compression codecs are currently determined at compile time. Allow the user to inject additional codecs programmatically without the need for dfs-datastores to know about them at compile time. Also,...

The default SkipStrategy in Cascading 2.1 and greater uses the timestamp of the directory returned by getPath(), which in the case of a Pail doesn't change. We need to override...