MLDataPattern.jl Eachbatch on slidingwindow returns unexpected results

Eachbatch on slidingwindow returns unexpected results

Open adinhobl opened this issue 4 years ago • 1 comments

trafficstars

I've been stuck on this for a while now and just traced it back to the behavior of eachbatch being different than what I would expect.

My data is shown below. Each row presents the values of all features at that timestep. I am using a shortened dataset below to provide an example.

5×19 DataFrame
 Row │ p (mbar)  T (degC)  Tpot (K)  Tdew (degC)  rh (%)    VPmax (mbar)  VPact (mbar)  VPdef (mbar)  sh (g/kg)  ⋯
     │ Float64   Float64   Float64   Float64      Float64   Float64       Float64       Float64       Float64    ⋯
─────┼────────────────────────────────────────────────────────────────────────────────────────────────────────────
   1 │ 0.945308  -1.98247  -2.04189     -1.91897  1.1171        -1.30285      -1.47732     -0.790424   -1.48004  ⋯
   2 │ 0.95977   -2.07837  -2.13817     -2.06096  1.04462       -1.33014      -1.53435     -0.786272   -1.53619
   3 │ 0.986284  -2.07028  -2.13244     -2.04519  1.06274       -1.32884      -1.52723     -0.788348   -1.5287
   4 │ 1.00436   -2.09801  -2.16109     -2.09682  1.00838       -1.33664      -1.54624     -0.782121   -1.54742
   5 │ 1.06101   -2.16503  -2.23215     -2.18718  0.984214      -1.35354      -1.5795      -0.782121   -1.58111  ⋯

My data is formatted using a slidingwindow where I have 19 features, and I want to predict the value of one element from the next timestep given just the current step as history. So I expect slidingwindow to produce samples where my data is a tuple of arrays, where each array is 19 x 1.

julia> data = slidingwindow(i -> i+h:i+h+f-1, Array(df)', h, stride=1)
4-element slidingwindow(::var"#49#50", ::LinearAlgebra.Adjoint{Float64,Array{Float64,2}}, 1) with eltype Tuple:
 ([0.945307599461624; -1.9824732337923627; … ; -0.06105235998429366; 1.4284340764807328], [0.9597698467321998; -2.0783721131577844; … ; -0.060029350634975095; 1.4284235902887745])
 ([0.9597698467321998; -2.0783721131577844; … ; -0.060029350634975095; 1.4284235902887745], [0.9862839667282576; -2.0702842558619055; … ; -0.05900634918042319; 1.4284123819361194])
 ([0.9862839667282576; -2.0702842558619055; … ; -0.05900634918042319; 1.4284123819361194], [1.0043617758164738; -2.098014052304919; … ; -0.05798335614623251; 1.4284004514285258])
 ([1.0043617758164738; -2.098014052304919; … ; -0.05798335614623251; 1.4284004514285258], [1.06100557762623; -2.165027727042202; … ; -0.0569603720579933; 1.4283877987721232])

Indeed, that is what it returns. Now I have 4 tuples, each with a 19 features x 1 timestep array for both the X and y. Technically, the type for each Array is 19×1 view(::LinearAlgebra.Adjoint{Float64,Array{Float64,2}}, :, 2:2) with eltype Float64:

Now, In my training loop, I would like to iterate over these samples and train a model on each X, y pair. If I do:

julia> for i in data
           @show i
       end
i = ([0.945307599461624; -1.9824732337923627; -2.0418884431001203; -1.9189727676846546; 1.1171015227337124; -1.3028511908231284; -1.4773232104641594; -0.7904236214710557; -1.480036367036059; -1.4826972088343169; 2.2185238106952334; 0.19340923901506027; 0.2211612940667541; 0.11114045471718151; 0.21792787317689125; 0.3661105594628274; 1.3660687962126672; -0.06105235998429366; 1.4284340764807328], [0.9597698467321998; -2.0783721131577844; -2.138166319274252; -2.0609637294850622; 1.0446173421369407; -1.330142569548392; -1.5343539122850658; -0.7862722979194303; -1.5361898079543617; -1.5390345155803171; 2.3257075520585757; 0.17298677377550387; 0.22210086635262455; 0.10945824511441363; 0.22779849852930875; 0.707199726148972; 1.224794368206394; -0.060029350634975095; 1.4284235902887745])
i = ([0.9597698467321998; -2.0783721131577844; -2.138166319274252; -2.0609637294850622; 1.0446173421369407; -1.330142569548392; -1.5343539122850658; -0.7862722979194303; -1.5361898079543617; -1.5390345155803171; 2.3257075520585757; 0.17298677377550387; 0.22210086635262455; 0.10945824511441363; 0.22779849852930875; 0.707199726148972; 1.224794368206394; -0.060029350634975095; 1.4284235902887745], [0.9862839667282576; -2.0702842558619055; -2.1324354933115113; -2.0451869559516838; 1.0627383872861333; -1.3288429800852843; -1.5272250745574525; -0.788347959695243; -1.5287026824985879; -1.531992352237067; 2.3239984719001643; 0.20798270241233952; 0.27626601473376766; 0.11121805128964106; 0.3240784159666602; 1.000099633755878; 1.0000592075027501; -0.05900634918042319; 1.4284123819361194])
i = ([0.9862839667282576; -2.0702842558619055; -2.1324354933115113; -2.0451869559516838; 1.0627383872861333; -1.3288429800852843; -1.5272250745574525; -0.788347959695243; -1.5287026824985879; -1.531992352237067; 2.3239984719001643; 0.20798270241233952; 0.27626601473376766; 0.11121805128964106; 0.3240784159666602; 1.000099633755878; 1.0000592075027501; -0.05900634918042319; 1.4284123819361194], [1.0043617758164738; -2.098014052304919; -2.1610896231252417; -2.096820032970014; 1.0083752518385543; -1.3366405168639308; -1.5462353084977547; -0.7821209743678049; -1.5474204961380218; -1.5531188422668172; 2.3589125379934575; 0.27034296641382005; 0.19526654910958366; 0.24690733563781467; 0.1451755627164811; 1.2248496376407976; 0.7071786439053358; -0.05798335614623251; 1.4284004514285258])
i = ([1.0043617758164738; -2.098014052304919; -2.1610896231252417; -2.096820032970014; 1.0083752518385543; -1.3366405168639308; -1.5462353084977547; -0.7821209743678049; -1.5474204961380218; -1.5531188422668172; 2.3589125379934575; 0.27034296641382005; 0.19526654910958366; 0.24690733563781467; 0.1451755627164811; 1.2248496376407976; 0.7071786439053358; -0.05798335614623251; 1.4284004514285258], [1.06100557762623; -2.165027727042202; -2.232151865063287; -2.1871779177520922; 0.9842138583062976; -1.3535351798843323; -1.5795032178932833; -0.7821209743678049; -1.5811125606890033; -1.5859822712019838; 2.446319780380877; 0.11226402369212508; 0.35081815405696193; 0.048640493357961015; 0.40205326621233467; 1.366133396461856; 0.3661120037946213; -0.0569603720579933; 1.4283877987721232])

to imitate my training loop, everything works well and I can process each sample as I would expect. I would like to do this operation using eachbatch because it is less memory-intensive and the long term goal is to batch these samples together. When I do the same with eachbatch of the data I expect roughly the same thing to happen, and the docs lead me to believe that my tuples will be passed to me 1-by-1.

If I try it manually specifying size=1, it seems to return what I want:

for i in eachbatch(data, size=1)
           @show i 
       end
i = (SubArray{Float64,2,LinearAlgebra.Adjoint{Float64,Array{Float64,2}},Tuple{Base.Slice{Base.OneTo{Int64}},Array{Int64,1}},false}[[0.945307599461624; -1.9824732337923627; -2.0418884431001203; -1.9189727676846546; 1.1171015227337124; -1.3028511908231284; -1.4773232104641594; -0.7904236214710557; -1.480036367036059; -1.4826972088343169; 2.2185238106952334; 0.19340923901506027; 0.2211612940667541; 0.11114045471718151; 0.21792787317689125; 0.3661105594628274; 1.3660687962126672; -0.06105235998429366; 1.4284340764807328]], SubArray{Float64,2,LinearAlgebra.Adjoint{Float64,Array{Float64,2}},Tuple{Base.Slice{Base.OneTo{Int64}},Array{Int64,1}},false}[[0.9597698467321998; -2.0783721131577844; -2.138166319274252; -2.0609637294850622; 1.0446173421369407; -1.330142569548392; -1.5343539122850658; -0.7862722979194303; -1.5361898079543617; -1.5390345155803171; 2.3257075520585757; 0.17298677377550387; 0.22210086635262455; 0.10945824511441363; 0.22779849852930875; 0.707199726148972; 1.224794368206394; -0.060029350634975095; 1.4284235902887745]])
i = (SubArray{Float64,2,LinearAlgebra.Adjoint{Float64,Array{Float64,2}},Tuple{Base.Slice{Base.OneTo{Int64}},Array{Int64,1}},false}[[0.9597698467321998; -2.0783721131577844; -2.138166319274252; -2.0609637294850622; 1.0446173421369407; -1.330142569548392; -1.5343539122850658; -0.7862722979194303; -1.5361898079543617; -1.5390345155803171; 2.3257075520585757; 0.17298677377550387; 0.22210086635262455; 0.10945824511441363; 0.22779849852930875; 0.707199726148972; 1.224794368206394; -0.060029350634975095; 1.4284235902887745]], SubArray{Float64,2,LinearAlgebra.Adjoint{Float64,Array{Float64,2}},Tuple{Base.Slice{Base.OneTo{Int64}},Array{Int64,1}},false}[[0.9862839667282576; -2.0702842558619055; -2.1324354933115113; -2.0451869559516838; 1.0627383872861333; -1.3288429800852843; -1.5272250745574525; -0.788347959695243; -1.5287026824985879; -1.531992352237067; 2.3239984719001643; 0.20798270241233952; 0.27626601473376766; 0.11121805128964106; 0.3240784159666602; 1.000099633755878; 1.0000592075027501; -0.05900634918042319; 1.4284123819361194]])
i = (SubArray{Float64,2,LinearAlgebra.Adjoint{Float64,Array{Float64,2}},Tuple{Base.Slice{Base.OneTo{Int64}},Array{Int64,1}},false}[[0.9862839667282576; -2.0702842558619055; -2.1324354933115113; -2.0451869559516838; 1.0627383872861333; -1.3288429800852843; -1.5272250745574525; -0.788347959695243; -1.5287026824985879; -1.531992352237067; 2.3239984719001643; 0.20798270241233952; 0.27626601473376766; 0.11121805128964106; 0.3240784159666602; 1.000099633755878; 1.0000592075027501; -0.05900634918042319; 1.4284123819361194]], SubArray{Float64,2,LinearAlgebra.Adjoint{Float64,Array{Float64,2}},Tuple{Base.Slice{Base.OneTo{Int64}},Array{Int64,1}},false}[[1.0043617758164738; -2.098014052304919; -2.1610896231252417; -2.096820032970014; 1.0083752518385543; -1.3366405168639308; -1.5462353084977547; -0.7821209743678049; -1.5474204961380218; -1.5531188422668172; 2.3589125379934575; 0.27034296641382005; 0.19526654910958366; 0.24690733563781467; 0.1451755627164811; 1.2248496376407976; 0.7071786439053358; -0.05798335614623251; 1.4284004514285258]])
i = (SubArray{Float64,2,LinearAlgebra.Adjoint{Float64,Array{Float64,2}},Tuple{Base.Slice{Base.OneTo{Int64}},Array{Int64,1}},false}[[1.0043617758164738; -2.098014052304919; -2.1610896231252417; -2.096820032970014; 1.0083752518385543; -1.3366405168639308; -1.5462353084977547; -0.7821209743678049; -1.5474204961380218; -1.5531188422668172; 2.3589125379934575; 0.27034296641382005; 0.19526654910958366; 0.24690733563781467; 0.1451755627164811; 1.2248496376407976; 0.7071786439053358; -0.05798335614623251; 1.4284004514285258]], SubArray{Float64,2,LinearAlgebra.Adjoint{Float64,Array{Float64,2}},Tuple{Base.Slice{Base.OneTo{Int64}},Array{Int64,1}},false}[[1.06100557762623; -2.165027727042202; -2.232151865063287; -2.1871779177520922; 0.9842138583062976; -1.3535351798843323; -1.5795032178932833; -0.7821209743678049; -1.5811125606890033; -1.5859822712019838; 2.446319780380877; 0.11226402369212508; 0.35081815405696193; 0.048640493357961015; 0.40205326621233467; 1.366133396461856; 0.3661120037946213; -0.0569603720579933; 1.4283877987721232]])

If I exclude size=1, I would expect that to be implicit, per the docs, but that's not what happens:

julia> for i in eachbatch(data)
           @show i 
       end
i = (SubArray{Float64,2,LinearAlgebra.Adjoint{Float64,Array{Float64,2}},Tuple{Base.Slice{Base.OneTo{Int64}},Array{Int64,1}},false}[[0.945307599461624 0.9597698467321998; -1.9824732337923627 -2.0783721131577844; -2.0418884431001203 -2.138166319274252; -1.9189727676846546 -2.0609637294850622; 1.1171015227337124 1.0446173421369407; -1.3028511908231284 -1.330142569548392; -1.4773232104641594 -1.5343539122850658; -0.7904236214710557 -0.7862722979194303; -1.480036367036059 -1.5361898079543617; -1.4826972088343169 -1.5390345155803171; 2.2185238106952334 2.3257075520585757; 0.19340923901506027 0.17298677377550387; 0.2211612940667541 0.22210086635262455; 0.11114045471718151 0.10945824511441363; 0.21792787317689125 0.22779849852930875; 0.3661105594628274 0.707199726148972; 1.3660687962126672 1.224794368206394; -0.06105235998429366 -0.060029350634975095; 1.4284340764807328 1.4284235902887745]], SubArray{Float64,2,LinearAlgebra.Adjoint{Float64,Array{Float64,2}},Tuple{Base.Slice{Base.OneTo{Int64}},Array{Int64,1}},false}[[0.9597698467321998 0.9862839667282576; -2.0783721131577844 -2.0702842558619055; -2.138166319274252 -2.1324354933115113; -2.0609637294850622 -2.0451869559516838; 1.0446173421369407 1.0627383872861333; -1.330142569548392 -1.3288429800852843; -1.5343539122850658 -1.5272250745574525; -0.7862722979194303 -0.788347959695243; -1.5361898079543617 -1.5287026824985879; -1.5390345155803171 -1.531992352237067; 2.3257075520585757 2.3239984719001643; 0.17298677377550387 0.20798270241233952; 0.22210086635262455 0.27626601473376766; 0.10945824511441363 0.11121805128964106; 0.22779849852930875 0.3240784159666602; 0.707199726148972 1.000099633755878; 1.224794368206394 1.0000592075027501; -0.060029350634975095 -0.05900634918042319; 1.4284235902887745 1.4284123819361194]])
i = (SubArray{Float64,2,LinearAlgebra.Adjoint{Float64,Array{Float64,2}},Tuple{Base.Slice{Base.OneTo{Int64}},Array{Int64,1}},false}[[0.9862839667282576 1.0043617758164738; -2.0702842558619055 -2.098014052304919; -2.1324354933115113 -2.1610896231252417; -2.0451869559516838 -2.096820032970014; 1.0627383872861333 1.0083752518385543; -1.3288429800852843 -1.3366405168639308; -1.5272250745574525 -1.5462353084977547; -0.788347959695243 -0.7821209743678049; -1.5287026824985879 -1.5474204961380218; -1.531992352237067 -1.5531188422668172; 2.3239984719001643 2.3589125379934575; 0.20798270241233952 0.27034296641382005; 0.27626601473376766 0.19526654910958366; 0.11121805128964106 0.24690733563781467; 0.3240784159666602 0.1451755627164811; 1.000099633755878 1.2248496376407976; 1.0000592075027501 0.7071786439053358; -0.05900634918042319 -0.05798335614623251; 1.4284123819361194 1.4284004514285258]], SubArray{Float64,2,LinearAlgebra.Adjoint{Float64,Array{Float64,2}},Tuple{Base.Slice{Base.OneTo{Int64}},Array{Int64,1}},false}[[1.0043617758164738 1.06100557762623; -2.098014052304919 -2.165027727042202; -2.1610896231252417 -2.232151865063287; -2.096820032970014 -2.1871779177520922; 1.0083752518385543 0.9842138583062976; -1.3366405168639308 -1.3535351798843323; -1.5462353084977547 -1.5795032178932833; -0.7821209743678049 -0.7821209743678049; -1.5474204961380218 -1.5811125606890033; -1.5531188422668172 -1.5859822712019838; 2.3589125379934575 2.446319780380877; 0.27034296641382005 0.11226402369212508; 0.19526654910958366 0.35081815405696193; 0.24690733563781467 0.048640493357961015; 0.1451755627164811 0.40205326621233467; 1.2248496376407976 1.366133396461856; 0.7071786439053358 0.3661120037946213; -0.05798335614623251 -0.0569603720579933; 1.4284004514285258 1.4283877987721232]])

When I do that, it only passes me back two samples, and each one looks like it has transformed my (X,y) tuples by concatenating the an X,y together and passing it back as the new X, and then concatenating the next X,y tuple together and passing it back as the new y.

This same result is returned if I use size=2 as an argument.

julia> for i in eachbatch(data, size=2)
           @show i 
       end
i = (SubArray{Float64,2,LinearAlgebra.Adjoint{Float64,Array{Float64,2}},Tuple{Base.Slice{Base.OneTo{Int64}},Array{Int64,1}},false}[[0.945307599461624 0.9597698467321998; -1.9824732337923627 -2.0783721131577844; -2.0418884431001203 -2.138166319274252; -1.9189727676846546 -2.0609637294850622; 1.1171015227337124 1.0446173421369407; -1.3028511908231284 -1.330142569548392; -1.4773232104641594 -1.5343539122850658; -0.7904236214710557 -0.7862722979194303; -1.480036367036059 -1.5361898079543617; -1.4826972088343169 -1.5390345155803171; 2.2185238106952334 2.3257075520585757; 0.19340923901506027 0.17298677377550387; 0.2211612940667541 0.22210086635262455; 0.11114045471718151 0.10945824511441363; 0.21792787317689125 0.22779849852930875; 0.3661105594628274 0.707199726148972; 1.3660687962126672 1.224794368206394; -0.06105235998429366 -0.060029350634975095; 1.4284340764807328 1.4284235902887745]], SubArray{Float64,2,LinearAlgebra.Adjoint{Float64,Array{Float64,2}},Tuple{Base.Slice{Base.OneTo{Int64}},Array{Int64,1}},false}[[0.9597698467321998 0.9862839667282576; -2.0783721131577844 -2.0702842558619055; -2.138166319274252 -2.1324354933115113; -2.0609637294850622 -2.0451869559516838; 1.0446173421369407 1.0627383872861333; -1.330142569548392 -1.3288429800852843; -1.5343539122850658 -1.5272250745574525; -0.7862722979194303 -0.788347959695243; -1.5361898079543617 -1.5287026824985879; -1.5390345155803171 -1.531992352237067; 2.3257075520585757 2.3239984719001643; 0.17298677377550387 0.20798270241233952; 0.22210086635262455 0.27626601473376766; 0.10945824511441363 0.11121805128964106; 0.22779849852930875 0.3240784159666602; 0.707199726148972 1.000099633755878; 1.224794368206394 1.0000592075027501; -0.060029350634975095 -0.05900634918042319; 1.4284235902887745 1.4284123819361194]])
i = (SubArray{Float64,2,LinearAlgebra.Adjoint{Float64,Array{Float64,2}},Tuple{Base.Slice{Base.OneTo{Int64}},Array{Int64,1}},false}[[0.9862839667282576 1.0043617758164738; -2.0702842558619055 -2.098014052304919; -2.1324354933115113 -2.1610896231252417; -2.0451869559516838 -2.096820032970014; 1.0627383872861333 1.0083752518385543; -1.3288429800852843 -1.3366405168639308; -1.5272250745574525 -1.5462353084977547; -0.788347959695243 -0.7821209743678049; -1.5287026824985879 -1.5474204961380218; -1.531992352237067 -1.5531188422668172; 2.3239984719001643 2.3589125379934575; 0.20798270241233952 0.27034296641382005; 0.27626601473376766 0.19526654910958366; 0.11121805128964106 0.24690733563781467; 0.3240784159666602 0.1451755627164811; 1.000099633755878 1.2248496376407976; 1.0000592075027501 0.7071786439053358; -0.05900634918042319 -0.05798335614623251; 1.4284123819361194 1.4284004514285258]], SubArray{Float64,2,LinearAlgebra.Adjoint{Float64,Array{Float64,2}},Tuple{Base.Slice{Base.OneTo{Int64}},Array{Int64,1}},false}[[1.0043617758164738 1.06100557762623; -2.098014052304919 -2.165027727042202; -2.1610896231252417 -2.232151865063287; -2.096820032970014 -2.1871779177520922; 1.0083752518385543 0.9842138583062976; -1.3366405168639308 -1.3535351798843323; -1.5462353084977547 -1.5795032178932833; -0.7821209743678049 -0.7821209743678049; -1.5474204961380218 -1.5811125606890033; -1.5531188422668172 -1.5859822712019838; 2.3589125379934575 2.446319780380877; 0.27034296641382005 0.11226402369212508; 0.19526654910958366 0.35081815405696193; 0.24690733563781467 0.048640493357961015; 0.1451755627164811 0.40205326621233467; 1.2248496376407976 1.366133396461856; 0.7071786439053358 0.3661120037946213; -0.05798335614623251 -0.0569603720579933; 1.4284004514285258 1.4283877987721232]])

Also, if I scale up batch sizes for time series, I would expect my (X,y) tuples of 19x1 arrays to be come 19x1xn arrays,where n is the batch size, rather than 19xn arrays, which is what happens as I increase the size.

julia> for i in eachbatch(data, size=3)
           @show i 
       end
i = (SubArray{Float64,2,LinearAlgebra.Adjoint{Float64,Array{Float64,2}},Tuple{Base.Slice{Base.OneTo{Int64}},Array{Int64,1}},false}[[0.945307599461624 0.9597698467321998 0.9862839667282576; -1.9824732337923627 -2.0783721131577844 -2.0702842558619055; -2.0418884431001203 -2.138166319274252 -2.1324354933115113; -1.9189727676846546 -2.0609637294850622 -2.0451869559516838; 1.1171015227337124 1.0446173421369407 1.0627383872861333; -1.3028511908231284 -1.330142569548392 -1.3288429800852843; -1.4773232104641594 -1.5343539122850658 -1.5272250745574525; -0.7904236214710557 -0.7862722979194303 -0.788347959695243; -1.480036367036059 -1.5361898079543617 -1.5287026824985879; -1.4826972088343169 -1.5390345155803171 -1.531992352237067; 2.2185238106952334 2.3257075520585757 2.3239984719001643; 0.19340923901506027 0.17298677377550387 0.20798270241233952; 0.2211612940667541 0.22210086635262455 0.27626601473376766; 0.11114045471718151 0.10945824511441363 0.11121805128964106; 0.21792787317689125 0.22779849852930875 0.3240784159666602; 0.3661105594628274 0.707199726148972 1.000099633755878; 1.3660687962126672 1.224794368206394 1.0000592075027501; -0.06105235998429366 -0.060029350634975095 -0.05900634918042319; 1.4284340764807328 1.4284235902887745 1.4284123819361194]], SubArray{Float64,2,LinearAlgebra.Adjoint{Float64,Array{Float64,2}},Tuple{Base.Slice{Base.OneTo{Int64}},Array{Int64,1}},false}[[0.9597698467321998 0.9862839667282576 1.0043617758164738; -2.0783721131577844 -2.0702842558619055 -2.098014052304919; -2.138166319274252 -2.1324354933115113 -2.1610896231252417; -2.0609637294850622 -2.0451869559516838 -2.096820032970014; 1.0446173421369407 1.0627383872861333 1.0083752518385543; -1.330142569548392 -1.3288429800852843 -1.3366405168639308; -1.5343539122850658 -1.5272250745574525 -1.5462353084977547; -0.7862722979194303 -0.788347959695243 -0.7821209743678049; -1.5361898079543617 -1.5287026824985879 -1.5474204961380218; -1.5390345155803171 -1.531992352237067 -1.5531188422668172; 2.3257075520585757 2.3239984719001643 2.3589125379934575; 0.17298677377550387 0.20798270241233952 0.27034296641382005; 0.22210086635262455 0.27626601473376766 0.19526654910958366; 0.10945824511441363 0.11121805128964106 0.24690733563781467; 0.22779849852930875 0.3240784159666602 0.1451755627164811; 0.707199726148972 1.000099633755878 1.2248496376407976; 1.224794368206394 1.0000592075027501 0.7071786439053358; -0.060029350634975095 -0.05900634918042319 -0.05798335614623251; 1.4284235902887745 1.4284123819361194 1.4284004514285258]])

I'm not sure if I'm making a simple mistake, or if there is something about slidingwindow that makes this different. I have also tried obsdim= 1, 2, and 3 to see if that made a difference, but it always errors on ERROR: AssertionError: obsdim === default_obsdim(A), so I don't think that's it.

Also, I'm unsure if there is a better suggested method to process and batch timeseries data. I would be happy to hear any such recommendations. :)

Thank you for any help!

Dec 20 '20 16:12 adinhobl

It seems to give me what I would expect if I convert Array() the sliding window:

julia> data
4-element slidingwindow(::var"#60#61", ::LinearAlgebra.Adjoint{Float64,Array{Float64,2}}, 1, obsdim = 2) with eltype Tuple:
 ([0.945307599461624; -1.9824732337923627; … ; -0.06105235998429366; 1.4284340764807328], [0.9597698467321998; -2.0783721131577844; … ; -0.060029350634975095; 1.4284235902887745])
 ([0.9597698467321998; -2.0783721131577844; … ; -0.060029350634975095; 1.4284235902887745], [0.9862839667282576; -2.0702842558619055; … ; -0.05900634918042319; 1.4284123819361194])
 ([0.9862839667282576; -2.0702842558619055; … ; -0.05900634918042319; 1.4284123819361194], [1.0043617758164738; -2.098014052304919; … ; -0.05798335614623251; 1.4284004514285258])
 ([1.0043617758164738; -2.098014052304919; … ; -0.05798335614623251; 1.4284004514285258], [1.06100557762623; -2.165027727042202; … ; -0.0569603720579933; 1.4283877987721232])

julia> Array(data)
4-element Array{Tuple{SubArray{Float64,2,LinearAlgebra.Adjoint{Float64,Array{Float64,2}},Tuple{Base.Slice{Base.OneTo{Int64}},UnitRange{Int64}},false},SubArray{Float64,2,LinearAlgebra.Adjoint{Float64,Array{Float64,2}},Tuple{Base.Slice{Base.OneTo{Int64}},UnitRange{Int64}},false}},1}:
 ([0.945307599461624; -1.9824732337923627; … ; -0.06105235998429366; 1.4284340764807328], [0.9597698467321998; -2.0783721131577844; … ; -0.060029350634975095; 1.4284235902887745])
 ([0.9597698467321998; -2.0783721131577844; … ; -0.060029350634975095; 1.4284235902887745], [0.9862839667282576; -2.0702842558619055; … ; -0.05900634918042319; 1.4284123819361194])
 ([0.9862839667282576; -2.0702842558619055; … ; -0.05900634918042319; 1.4284123819361194], [1.0043617758164738; -2.098014052304919; … ; -0.05798335614623251; 1.4284004514285258])
 ([1.0043617758164738; -2.098014052304919; … ; -0.05798335614623251; 1.4284004514285258], [1.06100557762623; -2.165027727042202; … ; -0.0569603720579933; 1.4283877987721232])

julia> for i in eachbatch(Array(data),size=2)
           @show i
       end
i = Tuple{SubArray{Float64,2,LinearAlgebra.Adjoint{Float64,Array{Float64,2}},Tuple{Base.Slice{Base.OneTo{Int64}},UnitRange{Int64}},false},SubArray{Float64,2,LinearAlgebra.Adjoint{Float64,Array{Float64,2}},Tuple{Base.Slice{Base.OneTo{Int64}},UnitRange{Int64}},false}}[([0.945307599461624; -1.9824732337923627; -2.0418884431001203; -1.9189727676846546; 1.1171015227337124; -1.3028511908231284; -1.4773232104641594; -0.7904236214710557; -1.480036367036059; -1.4826972088343169; 2.2185238106952334; 0.19340923901506027; 0.2211612940667541; 0.11114045471718151; 0.21792787317689125; 0.3661105594628274; 1.3660687962126672; -0.06105235998429366; 1.4284340764807328], [0.9597698467321998; -2.0783721131577844; -2.138166319274252; -2.0609637294850622; 1.0446173421369407; -1.330142569548392; -1.5343539122850658; -0.7862722979194303; -1.5361898079543617; -1.5390345155803171; 2.3257075520585757; 0.17298677377550387; 0.22210086635262455; 0.10945824511441363; 0.22779849852930875; 0.707199726148972; 1.224794368206394; -0.060029350634975095; 1.4284235902887745]), ([0.9597698467321998; -2.0783721131577844; -2.138166319274252; -2.0609637294850622; 1.0446173421369407; -1.330142569548392; -1.5343539122850658; -0.7862722979194303; -1.5361898079543617; -1.5390345155803171; 2.3257075520585757; 0.17298677377550387; 0.22210086635262455; 0.10945824511441363; 0.22779849852930875; 0.707199726148972; 1.224794368206394; -0.060029350634975095; 1.4284235902887745], [0.9862839667282576; -2.0702842558619055; -2.1324354933115113; -2.0451869559516838; 1.0627383872861333; -1.3288429800852843; -1.5272250745574525; -0.788347959695243; -1.5287026824985879; -1.531992352237067; 2.3239984719001643; 0.20798270241233952; 0.27626601473376766; 0.11121805128964106; 0.3240784159666602; 1.000099633755878; 1.0000592075027501; -0.05900634918042319; 1.4284123819361194])]
i = Tuple{SubArray{Float64,2,LinearAlgebra.Adjoint{Float64,Array{Float64,2}},Tuple{Base.Slice{Base.OneTo{Int64}},UnitRange{Int64}},false},SubArray{Float64,2,LinearAlgebra.Adjoint{Float64,Array{Float64,2}},Tuple{Base.Slice{Base.OneTo{Int64}},UnitRange{Int64}},false}}[([0.9862839667282576; -2.0702842558619055; -2.1324354933115113; -2.0451869559516838; 1.0627383872861333; -1.3288429800852843; -1.5272250745574525; -0.788347959695243; -1.5287026824985879; -1.531992352237067; 2.3239984719001643; 0.20798270241233952; 0.27626601473376766; 0.11121805128964106; 0.3240784159666602; 1.000099633755878; 1.0000592075027501; -0.05900634918042319; 1.4284123819361194], [1.0043617758164738; -2.098014052304919; -2.1610896231252417; -2.096820032970014; 1.0083752518385543; -1.3366405168639308; -1.5462353084977547; -0.7821209743678049; -1.5474204961380218; -1.5531188422668172; 2.3589125379934575; 0.27034296641382005; 0.19526654910958366; 0.24690733563781467; 0.1451755627164811; 1.2248496376407976; 0.7071786439053358; -0.05798335614623251; 1.4284004514285258]), ([1.0043617758164738; -2.098014052304919; -2.1610896231252417; -2.096820032970014; 1.0083752518385543; -1.3366405168639308; -1.5462353084977547; -0.7821209743678049; -1.5474204961380218; -1.5531188422668172; 2.3589125379934575; 0.27034296641382005; 0.19526654910958366; 0.24690733563781467; 0.1451755627164811; 1.2248496376407976; 0.7071786439053358; -0.05798335614623251; 1.4284004514285258], [1.06100557762623; -2.165027727042202; -2.232151865063287; -2.1871779177520922; 0.9842138583062976; -1.3535351798843323; -1.5795032178932833; -0.7821209743678049; -1.5811125606890033; -1.5859822712019838; 2.446319780380877; 0.11226402369212508; 0.35081815405696193; 0.048640493357961015; 0.40205326621233467; 1.366133396461856; 0.3661120037946213; -0.0569603720579933; 1.4283877987721232])]

I just wouldn't expect to need to manually call Array() to make it work. This leads me to believe that there's an issue in the way that LabeledSlidingWindows are indexed for the purpose of batching. I also need to try it with larger timeslices than 1 step, although I don't think that should cause any issues inherently.

Dec 23 '20 01:12 adinhobl

MLDataPattern.jl MLDataPattern.jl copied to clipboard

Eachbatch on slidingwindow returns unexpected results

MLDataPattern.jl
MLDataPattern.jl copied to clipboard