TimeSeries.jl icon indicating copy to clipboard operation
TimeSeries.jl copied to clipboard

Support reading data from DataStreams

Open milktrader opened this issue 9 years ago • 6 comments

milktrader avatar Oct 16 '16 00:10 milktrader

This enhancement request (supporting DataStreams.jl) was initially submit by @nalimilan https://github.com/JuliaStats/TimeSeries.jl/issues/290#issuecomment-254007499

Pinging @quinnj Maybe you can help on this ?

Code to convert DataFrame to TimeArray and TimeArray to DataFrame can be found here https://github.com/femtotrader/TimeSeriesIO.jl/blob/master/src/TimeSeriesIO.jl

it could help to build a TimeArray.Sink.

A TimeArray.Source (to convert from TimeArray to DataStream) will be also a nice feature to have.

If @milktrader doesn't want to add additional dependencies to TimeSeries.jl, this code can be part of TimeSeriesIO.jl

Related issues:

  • https://github.com/femtotrader/TimeSeriesIO.jl/issues/3
  • https://github.com/femtotrader/TimeSeriesIO.jl/issues/11

femtotrader avatar Nov 12 '16 09:11 femtotrader

Yes, I think this important functionality belongs in a separate package. Some other possible names ...

  • TimeSeriesTools (this might be too general)
  • TimeSeriesStreams

TimeSeriesIO is actually not bad for a package name either.

milktrader avatar Nov 12 '16 16:11 milktrader

The point of the DataStreams framework is that you wouldn't have to depend on DataFrames, just on DataStreams.jl, and you'd get support for streaming from/to any source, like DataFrame, CSV, databases, etc.

nalimilan avatar Nov 13 '16 10:11 nalimilan

Why not have DataStreams.jl support TimeSeries, like it supports DataFrames?

DataFrames does not support DataStreams.jl

milktrader avatar Nov 13 '16 14:11 milktrader

I still have some difficulties to understand functional differences between DataStreams.jl and IterableTables.jl

Maybe @davidanthoff and @quinnj can help for a better understanding

femtotrader avatar Jul 10 '17 20:07 femtotrader

In terms of goals the two packages are super similar. IterableTables.jl emerged out of the design of Query.jl, where the design of IterableTables.jl (namely iterators of NamedTuples.jl) forms the core of the most common backend.

In terms of design, the main difference currently is that IterableTables.jl only has one way of streaming data, namely row by row (where each row is a named tuple). DataStreams.jl offers two and different options: you can either stream field by field or column by column.

There are more sinks and sources for IterableTables.jl currently (more than a dozen as of right now). In particular, if you implement the IterableTables.jl interface, you get automatic interop with the DataStreams.jl sources and sinks via their field based streaming (but not with the column by column streaming). One other difference is in the details of the integration with Query.jl: while you can query a DataStreams.jl source, you should generally get a smoother experience if you query a IterableTables.jl because there are less wrapper steps involved. Same if you materialize a query into some tabular structure.

There are also some user API differences that should be fairly obvious if you just look at the examples of how to use the two packages.

I don't think we have ever done a performance comparison between the two approaches.

davidanthoff avatar Jul 11 '17 08:07 davidanthoff