dendrite icon indicating copy to clipboard operation
dendrite copied to clipboard

Add an option to read the first N lines from a file

Open philc opened this issue 6 years ago • 0 comments

Pulling only the first N lines from a file would be a useful option during development, when you want to read just a handful of lines out of a very large file to test your code.

You can approximate this behavior by using a reducer which exits early:

(let [lines-read (atom 0)
      wrapped-reducer (fn [acc v]
                        (if (>= @lines-read 10)
                          (do
                            (println "10 lines have been read")
                            (reduced acc))
                          (do (swap! lines-read inc)
                              (original-reducer acc v))))]
  ..)

However, this still takes a few seconds. From @jwhitbeck:

Indeed dendrite is currently optimized for throughput and is reading far ahead of those ten lines. The read process has two stages: (1) deserialize each active column into arrays of values (2) assemble the nested records from the flattened columnar layout. Your wrapped-reducer short-circuits (2) but doesn't impact (1).

Adding a "line-count" option is definitely the way to go, but isn't easy given the current implementation. More generally, I have a design for adding indexing/filtering capabilities and buffering less when it isn't needed which should make dendrite much snappier for interactive use.

philc avatar Mar 27 '18 17:03 philc