julia
julia copied to clipboard
Merge collect() into Vector()?
I wonder whether collect()
isn't always equivalent to Vector()
, and could therefore be merged into it:
julia> methods(collect)
#3 methods for generic function "collect":
collect(r::Range{T<:Any}) at range.jl:753
collect{T}(::Type{T}, itr) at array.jl:217
collect(itr) at array.jl:224
The method for Range
returns a vector, so it could be replaced with Array()
, as both call vcat()
in the end. convert(Vector, ...)
currently fails and should be fixed.
The methods for general iterables are more complex, but they essentially always create an Array
too. Even when they call similar()
, it's on a 1:1
range, so they actually create a Vector
. So the collect()
machinery could be moved to Array()
/Vector()
to make them support conversion from any iterable.
This issue is related to discussions regarding similar()
(https://github.com/JuliaLang/julia/issues/13731). Indeed, Julia provides a few functions which appear to behave differently, but in practice behave like Array
, and callers end up depending on this assumption. This makes the API more complex and less explicit.
See also https://github.com/JuliaLang/julia/issues/12251 about merging full
into Array
.
duplicate of https://github.com/JuliaLang/julia/issues/12251?
Indeed. Let's keep this issue to discuss the case of collect
then, to avoid polluting the other thread.
It would be a lot cleaner, as that is actually what people want when using "our" collect method. I only know the method name collect
from Ruby where it is identical to our map
(and their own map
,...), just to add to the confusion.
collect
returns an Array, not (always) a Vector. To usecollect
rather than convert(Array,...)
indicates better that values/objects ) are to be gathered which possibly do not yet exists and which possibly will not exist anymore as iterable after the collection (for example when collecting iterables of random numbers, user inputs or unique ids). My impression is thatconvert
should work on things which exist and should not wait an indefinite time for the generation of a new value/event.
convert(Array, itr)
might indeed look a bit surprising when doing an irreversible operation, so we could only support the Array(itr)
form. I don't think it would really be less explicit than collect
. The fact that the operation might block is dependent on/indicated by the type of the iterator, not be the function. Currently collect(1:3)
will never block, but other types may do so.
By the way convert(Array, 1:3)
is defined anyway because isa(1:3,AbstractArray)
is true.
I really like this suggestion, lets get rid of collect
!
I like the idea too, but it is ambiguous with constructors that take dimensions, e.g. Array{T}((m,n))
. That will need to be resolved.
Yes, but how? Can we deprecate Array{T}(::Tuple)
in favor of Array{T}(::Integer...)
? Or would that introduce a performance penalty?
It's not about the performance penalty so much as many people prefer to use the dimension tuple form, and in some sense it's a better, more consistent way to specify this.
It might be too late for this, but I'd almost say arguments like reshape
would be ideal: Array(iter, dims)
. A zeros matrix would be Array(repeated(0), (m,n))
. We could allow passing an empty iterator to get an uninitialized array.
So if we don't want to get rid of Array{T}(::Tuple)
, we can't use Array{T}(itr)
as a replacement for collect(T, itr)
. That's why I wonder whether it's really needed: yes it's consistent, but varargs are just another way of passing a tuple, so it's also consistent IMHO.
Anyway, we can still use convert(Array{T}, itr)
to replace collect(T, itr)
, though it's more verbose. Opinions?
@JeffBezanson I'm not sure I get your point. That indeed sounds interesting, but it doesn't replace collect
, and there's still an ambiguity with Array{T}(iter, dims)
.
Just to add a data point, the recently introduced constructors from iterables of BitArrays (#19018) also inevitably ended up having the slightly annoying feature that tuples of Int
s are special-cased:
julia> BitArray([0,1,0])
3-element BitArray{1}:
false
true
false
julia> BitArray((0,1,0))
0×1×0 BitArray{3}
julia> BitArray((false,true,false))
3-element BitArray{1}:
false
true
false
It's probably less severe than the Array case. But perhaps if a general solution is not found and collect
is ultimately kept, we should have bitcollect
, by analogy with other methods like bitrand
and bitbroadcast
, instead of BitArray(iter)
? (I don't like it at all, but it would be more consistent.)
It's probably less severe than the Array case. But perhaps if a general solution is not found and collect is ultimately kept, we should have bitcollect, by analogy with other methods like bitrand and bitbroadcast, instead of BitArray(iter)? (I don't like it at all, but it would be more consistent.)
I thought we were rather trying to get rid of the bit*
functions in favor or more generic constructs. We should either support collect(arraytype, itr)
or recommend using BitArray
. Unfortunately, the former in ambiguous if you want to collect an iterator of arrays, and the second is ambiguous if you want to collect a tuple of integers...
I thought we were rather trying to get rid of the
bit*
functions in favor or more generic constructs.
In principle yes, except that good generic constructs to specify the output container type have not yet emerged to my knowledge (other than the constructors, of course).
good generic constructs to specify the output container type
Ref. #16740, work in that direction. Best!
Now that Array( ... )
has been deprecated, it would be totally possible just to rename collect
to Array
. Sadly, that would include Array(T, iter)
, but that's no worse than collect
itself.
For Array{T}
and Array{T,N}
the only unambiguous signature available is Array{T}(iter, dims::Tuple)
. That might be ok sometimes, but unfortunately Array{T}(iter)
is what we'd want most.
One option is to deprecate e.g. Array{T}(1,2,3)
to Array{T,3}(1,2,3)
. Or we could require something like Array{T}(uninitialized, dims)
for the current behavior, explicitly requesting uninitialized memory (uninitialized
would be a special constant).
Since collect
produces a vector, what about spelling this as Vector(itr)
and Vector{T}(itr)
?
Edit: I realized the problem with this as soon as I posted it – Vector(n)
and Vector{T}(n)
alread mean something and integers are iterable. Sad trombone...
ref https://github.com/JuliaLang/julia/issues/15120
Arrays are at this point one of the only collections that doesn't have constructors that take arbitrary iterators of contents to construct with, making them quite the odd one out. Perhaps we should consider deprecating Vector(dims...)
and Vector(dims)
instead and have blank(T, dims...)
and blank(T, dims)
(or maybe uninitialized
?) functions similar to zeros
or ones
instead. Then Vector(itr)
would be available for this kind of construction.
A blank
what? Why isn't Dict()
also blank? Should it be blank(Array{T}, dims)
? Then I can have blank(MyAbstractArray{T}, args...)
?
Then I can have
blank(MyAbstractArray{T}, args...)
?
That's essentially what #16740 implements, for ones
, zeros
etc.
If we decided to get rid of #undef
so that creating an uninitialized array no longer makes sense (except maybe as an optimization in very specific cases), we could require choosing between zeros
, ones
and nulls
to construct an array. In Nulls.jl, the latter function creates an Array{Union{Null, T}}
filled with null
s, which can now be converted without making a copy to an Array{T}
once all nulls have been replaced with valid values. (Cf. https://github.com/JuliaLang/julia/issues/9147.)
The name junk
came up on triage this week and everyone seemed to rather like it. As in, if you want a matrix full of zeros, you write zeros(10, 10)
, if you want a matrix full of ones, you write ones(10, 10)
, and if you want a matrix full of junk, you write junk(10, 10)
😁. If we went with junk
we could reclaim Vector
to do what collect
currently does.
With that, Array
can do what collect
currently does. Is the plan to deprecate collect
? Just that collect
has the ability to choose the right container for an iterator, which does not need to be dense array as it is now. For example an RaggedArray.jl
can support (ragged) generators which then can collect
to RaggedArray
s etc. etc.
You would just write RaggedArray(itr)
for that case.
... if you created itr
yourself, of course. But Julia has a rudimentary system in place where iterators inform about the mutable collection they are naturally collected into via traits - and collect(itr)
is a way to hook into this system where itr
is just any iterable argument to a function something(itr)
. Replacing collect
by Array
makes this system more restrictive, retaining both collect
and Array
leaves it open for extension. In the end this is a design decision and I'll better not keep discussing, but those options are not equivalent.
I guess we could allow using AbstractArray(itr)
to let itr
choose the most appropriate container type (with Array
as a fallback)?
If a
has non-1 indices, currently convert(Vector, a)
and collect(a)
behave differently: the first throws an error and the second treats a
as a list starting with the first item. Which would Vector(a)
do?
Sorry I didn't mention this earlier.
Which would Vector(a) do?
For general iterable a
, Vector(a)
should treat a
as a list starting with the first item? Best!