pmap vs map for DArray?
Searching through the source for DistributedArray, it seems I'm supposed to use map instead of pmap on a DArray. Indeed, when I try to use pmap, strange things happen:
julia> testa = drand(36)
36-element DistributedArrays.DArray{Float64,1,Array{Float64,1}}:
0.636737
0.275769
0.961624
0.427848
0.668537
0.0215699
0.292591
0.487622
0.54222
0.579438
0.299413
0.0699156
0.985861
0.642223
0.0108336
0.466572
0.134984
0.0718047
0.600704
0.367337
0.722101
0.96763
0.427482
0.963513
0.467348
0.987774
0.773584
0.531576
0.0155698
0.383172
0.0347603
0.299581
0.0226568
0.687901
0.22271
0.238291
julia> testb = drand(36)
36-element DistributedArrays.DArray{Float64,1,Array{Float64,1}}:
0.933919
0.445692
0.0028197
0.722083
0.088373
0.820338
0.71782
0.972424
0.623399
0.157076
0.657007
0.0753378
0.712683
0.303925
0.591726
0.320129
0.5457
0.00830437
0.0753483
0.973917
0.171903
0.291315
0.875653
0.0619788
0.53868
0.069952
0.534305
0.798335
0.923633
0.239445
0.748613
0.00554521
0.650063
0.770877
0.237519
0.414616
julia> pmap( (x) -> (x*x), testa)
175-element Array{Any,1}:
0.405433
0.405433
0.405433
0.405433
0.405433
0.405433
0.0760486
0.0760486
0.0760486
0.0760486
0.0760486
0.0760486
0.92472
0.92472
0.92472
0.183054
0.183054
0.183054
0.92472
0.183054
0.183054
0.183054
0.446942
0.446942
0.446942
0.446942
0.000465259
0.446942
⋮
0.00120828
0.00120828
0.00120828
0.00120828
0.00120828
0.00120828
0.0897485
0.0897485
0.0897485
0.0897485
0.0897485
0.000513332
0.000513332
0.473207
0.000513332
0.000513332
0.473207
0.473207
0.473207
0.0495998
0.473207
0.0495998
0.0495998
0.0495998
0.0567826
0.0567826
0.0567826
0.0567826
julia> map( (x) -> (x*x), testb)
36-element DistributedArrays.DArray{Float64,1,Array{Float64,1}}:
0.872205
0.198641
7.95072e-6
0.521403
0.00780978
0.672954
0.515266
0.945609
0.388626
0.0246728
0.431658
0.00567578
0.507917
0.0923703
0.35014
0.102483
0.297789
6.89625e-5
0.00567737
0.948514
0.0295506
0.0848642
0.766769
0.00384137
0.290177
0.00489328
0.285482
0.637339
0.853098
0.0573337
0.560421
3.07494e-5
0.422581
0.594251
0.0564152
0.171907
(Sorry for how long that was!)
Yet intuitively, I would have expected pmap to work like map does on a distributed array. I'm fine with my intuition being wrong here, but should there be a note in the documentation to use map, and not pmap? What is the expected behaviour of pmap on a DArray?
@kshyatt sorry for taking so long to respond. pmap is over a local collection that you want to do in parallel (vector of Array chunks) which supports dynamic scheduling (work is is queued, and as processes finish they ask for more work from the queue). map for DArrays is a computation that you want to do in parallel on data that is already distributed (if the distribution of data is unbalanced, you have to wait for the slowest computation over the data). The processing of pmap is dynamically scheduled, for DArray it is static.
I don't think it really makes sense to support the semantics of pmap for DArray. I'm sympathetic that it is confusing, but it would be equally confusing to have an alias pmap == map for DArray's which would diverge from the pmap model. I'll close the issue for now.
While implementing pmap for DArray may not make sense, I still think there are actionable items here. @kshyatt suggested putting something in the docs about this. I'd also suggest making pmap error in this case instead of returning garbage results. Would such a change make sense to you?
Sure, Ill try to add some docs tomorrow.
Don't worry about the wait. Thanks for the clarification re: pmap. I think the docs could be clearer about this distinction because right now many people like me will go "parallel map for a parallel array? Let's do this thing!"