YAXArrays.jl icon indicating copy to clipboard operation
YAXArrays.jl copied to clipboard

Integration with Tables.jl

Open s-celles opened this issue 1 year ago • 8 comments

Hello,

I'd like to know if integration with Tables.jl https://tables.juliadata.org/dev/ have been considered to export a slice of an YAXArray to DataFrames.DataFrame, TimeSeries.TimeArray, TSFrames.TSFrame... Maybe YAXArray could be both a source and a sink. Any opinion ?

Kind regards

s-celles avatar Jan 04 '24 22:01 s-celles

it looks like is already supported https://rafaqz.github.io/DimensionalData.jl/dev/reference/?h=dimtable#tablesjltabletraitsjl-interface, maybe we could just tried out with some examples, and if it works add them to the docs? What simple examples do you have in mind?

lazarusA avatar Jan 04 '24 22:01 lazarusA

I see two kind of example.

YAXArray as sink Download 3 symbols data from MarketData.jl (for example) and get a "cube".

YAXArray as source Take the previously obtained cube, swap 2 dimensions and get a DataFrame ohlcv at a given date, get a TSFrame of close prices with symbol as column...

This lib shouldn't be added to YAXArray so you will probably have to deal with package extensions https://youtu.be/TiIZlQhFzyk?si=Lvm6RSp3WjuqtV-o

An other idea if you don't want to rely on remote data could be to generate similar data with a random walk.

s-celles avatar Jan 05 '24 08:01 s-celles

Here is some random data to build a 3D cube

julia> using MarketData

julia> data = Dict("Stock1" => random_ohlcv(), "Stock2" => random_ohlcv(), "Stock3" => random_ohlcv())
Dict{String, TimeArray{Float64, 2, DateTime, Matrix{Float64}}} with 3 entries:
  "Stock2" => 500×5 TimeArray{Float64, 2, DateTime, Matrix{Float64}} 2020-01-01T00:00:00 to 2020-01-21T19:00:00
  "Stock3" => 500×5 TimeArray{Float64, 2, DateTime, Matrix{Float64}} 2020-01-01T00:00:00 to 2020-01-21T19:00:00
  "Stock1" => 500×5 TimeArray{Float64, 2, DateTime, Matrix{Float64}} 2020-01-01T00:00:00 to 2020-01-21T19:00:00

julia> data["Stock1"]
500×5 TimeArray{Float64, 2, DateTime, Matrix{Float64}} 2020-01-01T00:00:00 to 2020-01-21T19:00:00
┌─────────────────────┬────────┬────────┬────────┬────────┬────────┐
│                     │ Open   │ High   │ Low    │ Close  │ Volume │
├─────────────────────┼────────┼────────┼────────┼────────┼────────┤
│ 2020-01-01T00:00:00 │ 654.02 │ 657.91 │ 652.74 │ 657.91 │   47.8 │
│ 2020-01-01T01:00:00 │ 657.59 │ 663.22 │ 656.93 │ 658.29 │   55.2 │
│ 2020-01-01T02:00:00 │ 658.09 │  662.2 │  649.3 │  649.3 │    3.7 │
│ 2020-01-01T03:00:00 │ 649.57 │ 649.57 │ 634.44 │ 636.65 │   13.9 │
│ 2020-01-01T04:00:00 │ 637.35 │ 639.31 │ 635.88 │ 635.88 │   35.8 │
│ 2020-01-01T05:00:00 │  635.6 │ 636.46 │ 626.38 │ 628.16 │   68.8 │
│ 2020-01-01T06:00:00 │ 627.61 │ 629.29 │ 622.35 │ 629.29 │   27.1 │
│ 2020-01-01T07:00:00 │ 630.18 │ 637.41 │ 630.18 │ 634.59 │   39.0 │
│ 2020-01-01T08:00:00 │ 634.84 │ 635.42 │ 626.56 │ 626.56 │   26.7 │
│ 2020-01-01T09:00:00 │ 625.98 │ 627.14 │ 622.37 │ 626.96 │    8.7 │
│ 2020-01-01T10:00:00 │ 627.76 │ 636.52 │ 627.67 │  634.8 │   79.7 │
│ 2020-01-01T11:00:00 │ 634.71 │ 635.36 │ 629.06 │ 629.65 │   70.6 │
│          ⋮          │   ⋮    │   ⋮    │   ⋮    │   ⋮    │   ⋮    │
│ 2020-01-21T08:00:00 │  793.7 │ 795.42 │ 785.97 │ 786.96 │   63.8 │
│ 2020-01-21T09:00:00 │ 787.38 │  791.3 │ 785.83 │ 785.83 │    0.0 │
│ 2020-01-21T10:00:00 │ 786.02 │ 793.74 │ 784.98 │ 793.74 │   71.2 │
│ 2020-01-21T11:00:00 │ 794.73 │ 795.11 │ 790.71 │ 790.71 │   76.3 │
│ 2020-01-21T12:00:00 │ 789.92 │ 790.87 │ 786.32 │ 787.38 │   42.7 │
│ 2020-01-21T13:00:00 │ 788.26 │ 788.33 │ 782.01 │ 782.48 │   61.6 │
│ 2020-01-21T14:00:00 │ 781.58 │ 782.98 │ 777.93 │ 782.13 │   31.2 │
│ 2020-01-21T15:00:00 │ 781.66 │ 782.95 │ 774.77 │ 779.68 │   44.5 │
│ 2020-01-21T16:00:00 │ 779.35 │ 784.95 │ 773.43 │ 784.95 │   34.2 │
│ 2020-01-21T17:00:00 │ 785.61 │ 789.73 │ 783.63 │  787.8 │   50.2 │
│ 2020-01-21T18:00:00 │ 787.51 │ 794.35 │ 787.37 │ 792.83 │    3.5 │
│ 2020-01-21T19:00:00 │ 792.87 │  794.0 │ 790.51 │ 793.18 │   16.9 │
└─────────────────────┴────────┴────────┴────────┴────────┴────────┘
                                                    476 rows omitted

julia> data["Stock2"]
500×5 TimeArray{Float64, 2, DateTime, Matrix{Float64}} 2020-01-01T00:00:00 to 2020-01-21T19:00:00
┌─────────────────────┬────────┬────────┬────────┬────────┬────────┐
│                     │ Open   │ High   │ Low    │ Close  │ Volume │
├─────────────────────┼────────┼────────┼────────┼────────┼────────┤
│ 2020-01-01T00:00:00 │  155.8 │ 167.25 │ 154.93 │ 165.42 │   40.8 │
│ 2020-01-01T01:00:00 │ 164.48 │ 167.51 │ 162.54 │ 165.19 │   29.5 │
│ 2020-01-01T02:00:00 │ 165.66 │ 171.29 │ 164.89 │ 165.11 │   55.0 │
│ 2020-01-01T03:00:00 │ 164.35 │ 169.62 │ 164.35 │ 165.48 │   13.2 │
│ 2020-01-01T04:00:00 │ 165.26 │ 168.44 │ 164.23 │ 165.34 │   97.3 │
│ 2020-01-01T05:00:00 │ 166.05 │ 171.79 │  166.0 │  170.8 │   62.7 │
│ 2020-01-01T06:00:00 │ 170.63 │ 174.14 │ 170.17 │ 174.02 │   66.8 │
│ 2020-01-01T07:00:00 │ 174.49 │ 179.76 │ 174.49 │ 178.54 │   40.5 │
│ 2020-01-01T08:00:00 │  177.8 │ 179.85 │ 175.84 │ 176.01 │   63.8 │
│ 2020-01-01T09:00:00 │ 176.92 │ 181.39 │ 174.55 │ 176.26 │   50.3 │
│ 2020-01-01T10:00:00 │ 175.69 │ 176.43 │ 171.21 │ 172.28 │   59.0 │
│ 2020-01-01T11:00:00 │ 172.14 │ 177.01 │ 168.63 │ 175.23 │   90.2 │
│          ⋮          │   ⋮    │   ⋮    │   ⋮    │   ⋮    │   ⋮    │
│ 2020-01-21T08:00:00 │  149.9 │ 151.54 │ 146.31 │ 150.34 │   98.0 │
│ 2020-01-21T09:00:00 │ 150.64 │ 151.86 │ 145.85 │ 148.63 │   89.7 │
│ 2020-01-21T10:00:00 │ 149.62 │ 152.04 │ 144.73 │ 149.19 │   87.3 │
│ 2020-01-21T11:00:00 │ 148.48 │ 150.29 │ 140.75 │ 141.65 │   35.2 │
│ 2020-01-21T12:00:00 │ 142.39 │ 142.39 │ 137.89 │ 142.14 │   47.5 │
│ 2020-01-21T13:00:00 │ 142.88 │ 151.71 │ 140.67 │ 150.35 │   67.1 │
│ 2020-01-21T14:00:00 │ 150.02 │ 152.85 │ 148.64 │ 150.31 │   12.8 │
│ 2020-01-21T15:00:00 │ 150.84 │ 157.52 │ 150.84 │ 156.68 │   29.6 │
│ 2020-01-21T16:00:00 │ 157.44 │ 165.22 │ 157.44 │ 163.09 │   74.6 │
│ 2020-01-21T17:00:00 │ 163.36 │ 167.37 │ 163.08 │ 165.92 │   56.6 │
│ 2020-01-21T18:00:00 │ 166.68 │ 174.08 │ 166.68 │ 171.58 │   22.0 │
│ 2020-01-21T19:00:00 │ 170.61 │ 174.85 │ 169.47 │ 171.41 │   29.6 │
└─────────────────────┴────────┴────────┴────────┴────────┴────────┘
                                                    476 rows omitted

julia> data["Stock3"]
500×5 TimeArray{Float64, 2, DateTime, Matrix{Float64}} 2020-01-01T00:00:00 to 2020-01-21T19:00:00
┌─────────────────────┬────────┬────────┬───────┬────────┬────────┐
│                     │ Open   │ High   │ Low   │ Close  │ Volume │
├─────────────────────┼────────┼────────┼───────┼────────┼────────┤
│ 2020-01-01T00:00:00 │  44.15 │  46.02 │ 40.92 │  44.89 │   24.8 │
│ 2020-01-01T01:00:00 │  45.06 │  50.57 │ 43.49 │  49.09 │   45.2 │
│ 2020-01-01T02:00:00 │  49.96 │  54.79 │ 48.06 │  53.76 │   21.9 │
│ 2020-01-01T03:00:00 │   53.2 │  59.82 │ 52.42 │  56.41 │    6.2 │
│ 2020-01-01T04:00:00 │  56.04 │  59.03 │ 53.74 │  54.75 │   92.3 │
│ 2020-01-01T05:00:00 │   54.8 │  56.29 │ 50.81 │  55.76 │   52.2 │
│ 2020-01-01T06:00:00 │  56.34 │   56.7 │ 52.95 │  53.04 │   72.6 │
│ 2020-01-01T07:00:00 │  52.87 │  53.49 │ 46.98 │  46.98 │   21.1 │
│ 2020-01-01T08:00:00 │  46.51 │  50.58 │ 44.67 │  49.95 │   52.5 │
│ 2020-01-01T09:00:00 │  49.37 │  49.68 │ 43.78 │  45.73 │   68.3 │
│ 2020-01-01T10:00:00 │  45.24 │  50.73 │ 45.24 │  50.73 │   45.9 │
│ 2020-01-01T11:00:00 │  51.21 │  53.11 │ 48.01 │  52.05 │   44.9 │
│          ⋮          │   ⋮    │   ⋮    │   ⋮   │   ⋮    │   ⋮    │
│ 2020-01-21T08:00:00 │  85.54 │   88.5 │ 84.51 │  86.84 │   91.9 │
│ 2020-01-21T09:00:00 │  86.63 │  86.63 │ 80.47 │  84.93 │   49.2 │
│ 2020-01-21T10:00:00 │   85.7 │  87.37 │ 79.86 │  80.99 │   59.1 │
│ 2020-01-21T11:00:00 │   81.5 │  83.25 │ 77.61 │  79.87 │   25.4 │
│ 2020-01-21T12:00:00 │  80.07 │  80.07 │ 74.48 │  74.48 │   65.7 │
│ 2020-01-21T13:00:00 │  74.04 │  76.15 │ 71.99 │   75.5 │   84.9 │
│ 2020-01-21T14:00:00 │  75.42 │  82.62 │ 75.42 │  78.98 │   35.5 │
│ 2020-01-21T15:00:00 │  78.84 │  80.16 │ 75.16 │  75.52 │   70.6 │
│ 2020-01-21T16:00:00 │  75.63 │  75.63 │ 70.72 │  73.43 │   46.1 │
│ 2020-01-21T17:00:00 │   73.1 │  75.34 │  71.0 │  71.77 │   14.9 │
│ 2020-01-21T18:00:00 │  72.43 │  74.53 │ 68.28 │  68.28 │   81.8 │
│ 2020-01-21T19:00:00 │  68.24 │  68.79 │ 63.75 │   67.1 │   96.2 │
└─────────────────────┴────────┴────────┴───────┴────────┴────────┘
                                                   476 rows omitted

Unfortunately I don't know how to get this into YAXArrays.jl

femtotrader avatar Apr 25 '24 20:04 femtotrader

You could construct a YAXArray from every separate stock with this:

s = data["Stock1"]
julia> d = (Ti(timestamp(s)), Dim{:colnames}(colnames(s)))

julia> YAXArray(d, values(s));

This would construct a two dimensional YAXArray from the data in the TimeArray. If you would like to have a three dimensional YAXArray with a dimension for the stocks you could use cat(yaxlist, dims=Dim{:Stock}(["1", "2", "3"]) or you could use a Dataset which would behave more like a Dict and there you could have Arrays with different dimensions.

felixcremer avatar Apr 25 '24 22:04 felixcremer

using YAXArrays
d = (Ti(timestamp(s)), Dim{:colnames}(colnames(s)))

is broken. It raises

ERROR: UndefVarError: `Ti` not defined

femtotrader avatar Apr 26 '24 07:04 femtotrader

using DimensionalData: DimensionalData as DD

and using DD.Ti should help

femtotrader avatar Apr 26 '24 08:04 femtotrader

Yes sorry, forgot the import of DD. Is this what you had in mind?

felixcremer avatar Apr 26 '24 08:04 felixcremer

What I had is mind was to provide a full example like so

using MarketData
using DataStructures
using YAXArrays
using DimensionalData: DimensionalData as DD

d_data = OrderedDict("Stock1" => random_ohlcv(), "Stock2" => random_ohlcv(), "Stock3" => random_ohlcv())

yaxlist = YAXArray[]
for (stock, stock_data) in d_data
    d = (DD.Ti(timestamp(stock_data)), Dim{:colnames}(colnames(stock_data)))
    yax = YAXArray(d, values(stock_data))
    push!(yaxlist, yax)
end
data = cat(yaxlist, dims=Dim{:Stock}(keys(d_data)))

but last line is failing.

ERROR: MethodError: no method matching iterate(::Dim{:Stock, Base.KeySet{String, OrderedDict{String, TimeArray{Float64, 2, DateTime, Matrix{Float64}}}}})

Closest candidates are:
  iterate(::Base.AsyncGenerator, ::Base.AsyncGeneratorState)
   @ Base asyncmap.jl:362
  iterate(::Base.AsyncGenerator)
   @ Base asyncmap.jl:362
  iterate(::DataStructures.TrieIterator)
   @ DataStructures C:\Users\femto\.julia\packages\DataStructures\95DJa\src\trie.jl:112
  ...

same for

data = cat(yaxlist, dims=Dim{:Stock}(collect(keys(d_data))))
ERROR: MethodError: no method matching isless(::String, ::Int64)

Closest candidates are:
  isless(::Missing, ::Any)
   @ Base missing.jl:87
  isless(::Any, ::Missing)
   @ Base missing.jl:88
  isless(::ForwardDiff.Dual{Tx}, ::Integer) where Tx
   @ ForwardDiff C:\Users\femto\.julia\packages\ForwardDiff\PcZ48\src\dual.jl:144

femtotrader avatar Apr 26 '24 09:04 femtotrader