RecordStream
RecordStream copied to clipboard
Multiplex output to files?
Occasionally I reach for recs multiplex
when I want to split a record stream into multiple files. For example, recs multiplex -k foo -- recs-tocsv
works great, except all of the CSV output goes to stdout. When I'm using an output format without a distinct marker to split on, I usually work around this limitation using some combination of recs
piped to parallel
running the recs to...
command. recs chain
and recs generate
seem like they would almost allow me to multiplex to separate files, but either generate
needs to support outputting non-records or chain
needs to support some sort of interpolation like generate
(ick).
In terms of supporting this feature, I see two options:
- Build support into
multiplex
itself. Something like--output-filename-key=<keyspec>
or--output-filename=<snippet>
on which output is written to for each group. The filename key or evaluated snippet would be added to the set of keys records are grouped upon. - Add a new operation which enables use of the existing
multiplex
to do this, for example:recs multiplex -k foo -- recs-tofiles -k filename -- recs-tocsv
I think option one is cleaner than option two, both in terms of implementation and command line syntax. Option two however is implementable outside of core recs.
Is this feature worth having in core recs? General thoughts?
Hmmm I'm cool with having it in core recs, just not certain what the interface should be... I think it seems reasonable to me to have multiplex be able to do it...
Another options would be to add a -o flag to all recs commands, like --filename-key that lets you output to a named file (which seems reasonable) and then let multiplex be able to interpolate command names based on a clumping record...
Would be cool to have the latter, but the former is much more usable. I'll also ping @amling to see what he thinks
I'm also not sure the right combination of primitives to pull this off, but here are some ideas:
Something we've thought about previously was having line output commands
take a --records to output a single key ("LINE" or the like) record
instead. This means inside multiplex you'd get your bucket stamped back
on that so recs-multiplex -k foo -- recs-tocsv --records
would have
output records with a "foo" field and a "LINE" field.
That's sort of the minimum of multiplex+tocsv not destroying the data.
After that the best primitive to sort into files is not very clear,
especially because you want to sort into file by "foo" field but then
also eval down to the "LINE" field. That alone doesn't seem like a
great primitive, but maybe it would be OK? recs-tofiles --file <snippet>
would write records (not what you want here but we'd allow
it), --line <snippet>
would write the evaluation of the snippet.
End-to-end this makes it:
recs-multiplex -k foo -- recs-tocsv --records | recs-tofiles --file '{{foo}}' --line '{{LINE}}'
We could also split --file into --file-key (-f) and --file-eval (-F) and likewise --line into --line-key (-l) and --line-eval (-L):
recs-multiplex -k foo -- recs-tocsv --records | recs-tofiles -f foo -l LINE
Keith
On Wed, May 06, 2015 at 10:31:11AM -0700, Ben Bernard wrote:
Hmmm I'm cool with having it in core recs, just not certain what the interface should be... I think it seems reasonable to me to have multiplex be able to do it...
Another options would be to add a -o flag to all recs commands, like --filename-key that lets you output to a named file (which seems reasonable) and then let multiplex be able to interpolate command names based on a clumping record...
Would be cool to have the latter, but the former is much more usable. I'll also ping @amling to see what he thinks
Reply to this email directly or view it on GitHub: https://github.com/benbernard/RecordStream/issues/59#issuecomment-99545241
Keith and I talked about this for a long while today... we think probably the best thing to do is to build it into multiplex...
recs-multiplex -k foo -o foo -- recs-tocsv
would output to a file named foo-FOO_VALUE for each clump, with the output of tocsv
Similarly you could use -O to provide evalable perl to generate the filename
recs-multiplex -k foo -O '"myawesomefile-{{foo}}.recs"'
We thought about tofiles for a long time, but in the end it just seemed to be duplicating multiplex clumping without much value....
Thoughts?
Sounds good! I agree about duplicating the multiplex clumping without much value, and that's why I also had settled on option one instead of option two when thinking this through.
Unless you or Keith have a burning desire to implement this, I'll probably take a swing at it in the next few weeks.