julia icon indicating copy to clipboard operation
julia copied to clipboard

Merge collect() into Vector()?

Open nalimilan opened this issue 8 years ago • 45 comments

I wonder whether collect() isn't always equivalent to Vector(), and could therefore be merged into it:

julia> methods(collect)
#3 methods for generic function "collect":
collect(r::Range{T<:Any}) at range.jl:753
collect{T}(::Type{T}, itr) at array.jl:217
collect(itr) at array.jl:224

The method for Range returns a vector, so it could be replaced with Array(), as both call vcat() in the end. convert(Vector, ...) currently fails and should be fixed.

The methods for general iterables are more complex, but they essentially always create an Array too. Even when they call similar(), it's on a 1:1 range, so they actually create a Vector. So the collect() machinery could be moved to Array()/Vector() to make them support conversion from any iterable.

This issue is related to discussions regarding similar() (https://github.com/JuliaLang/julia/issues/13731). Indeed, Julia provides a few functions which appear to behave differently, but in practice behave like Array, and callers end up depending on this assumption. This makes the API more complex and less explicit.

See also https://github.com/JuliaLang/julia/issues/12251 about merging full into Array.

nalimilan avatar Apr 24 '16 10:04 nalimilan

duplicate of https://github.com/JuliaLang/julia/issues/12251?

tkelman avatar Apr 25 '16 03:04 tkelman

Indeed. Let's keep this issue to discuss the case of collect then, to avoid polluting the other thread.

nalimilan avatar Apr 25 '16 08:04 nalimilan

It would be a lot cleaner, as that is actually what people want when using "our" collect method. I only know the method name collect from Ruby where it is identical to our map (and their own map,...), just to add to the confusion.

pkofod avatar Apr 25 '16 08:04 pkofod

collect returns an Array, not (always) a Vector. To usecollect rather than convert(Array,...) indicates better that values/objects ) are to be gathered which possibly do not yet exists and which possibly will not exist anymore as iterable after the collection (for example when collecting iterables of random numbers, user inputs or unique ids). My impression is thatconvert should work on things which exist and should not wait an indefinite time for the generation of a new value/event.

mschauer avatar Apr 26 '16 14:04 mschauer

convert(Array, itr) might indeed look a bit surprising when doing an irreversible operation, so we could only support the Array(itr) form. I don't think it would really be less explicit than collect. The fact that the operation might block is dependent on/indicated by the type of the iterator, not be the function. Currently collect(1:3) will never block, but other types may do so.

nalimilan avatar Apr 26 '16 14:04 nalimilan

By the way convert(Array, 1:3) is defined anyway because isa(1:3,AbstractArray) is true.

mschauer avatar Apr 26 '16 14:04 mschauer

I really like this suggestion, lets get rid of collect!

davidanthoff avatar Oct 19 '16 17:10 davidanthoff

I like the idea too, but it is ambiguous with constructors that take dimensions, e.g. Array{T}((m,n)). That will need to be resolved.

JeffBezanson avatar Oct 19 '16 17:10 JeffBezanson

Yes, but how? Can we deprecate Array{T}(::Tuple) in favor of Array{T}(::Integer...)? Or would that introduce a performance penalty?

nalimilan avatar Oct 19 '16 17:10 nalimilan

It's not about the performance penalty so much as many people prefer to use the dimension tuple form, and in some sense it's a better, more consistent way to specify this.

StefanKarpinski avatar Oct 20 '16 21:10 StefanKarpinski

It might be too late for this, but I'd almost say arguments like reshape would be ideal: Array(iter, dims). A zeros matrix would be Array(repeated(0), (m,n)). We could allow passing an empty iterator to get an uninitialized array.

JeffBezanson avatar Oct 20 '16 22:10 JeffBezanson

So if we don't want to get rid of Array{T}(::Tuple), we can't use Array{T}(itr) as a replacement for collect(T, itr). That's why I wonder whether it's really needed: yes it's consistent, but varargs are just another way of passing a tuple, so it's also consistent IMHO.

Anyway, we can still use convert(Array{T}, itr) to replace collect(T, itr), though it's more verbose. Opinions?

@JeffBezanson I'm not sure I get your point. That indeed sounds interesting, but it doesn't replace collect, and there's still an ambiguity with Array{T}(iter, dims).

nalimilan avatar Oct 21 '16 07:10 nalimilan

Just to add a data point, the recently introduced constructors from iterables of BitArrays (#19018) also inevitably ended up having the slightly annoying feature that tuples of Ints are special-cased:

julia> BitArray([0,1,0])
3-element BitArray{1}:
 false
  true
 false

julia> BitArray((0,1,0))
0×1×0 BitArray{3}

julia> BitArray((false,true,false))
3-element BitArray{1}:
 false
  true
 false

It's probably less severe than the Array case. But perhaps if a general solution is not found and collect is ultimately kept, we should have bitcollect, by analogy with other methods like bitrand and bitbroadcast, instead of BitArray(iter)? (I don't like it at all, but it would be more consistent.)

carlobaldassi avatar Oct 22 '16 08:10 carlobaldassi

It's probably less severe than the Array case. But perhaps if a general solution is not found and collect is ultimately kept, we should have bitcollect, by analogy with other methods like bitrand and bitbroadcast, instead of BitArray(iter)? (I don't like it at all, but it would be more consistent.)

I thought we were rather trying to get rid of the bit* functions in favor or more generic constructs. We should either support collect(arraytype, itr) or recommend using BitArray. Unfortunately, the former in ambiguous if you want to collect an iterator of arrays, and the second is ambiguous if you want to collect a tuple of integers...

nalimilan avatar Oct 22 '16 09:10 nalimilan

I thought we were rather trying to get rid of the bit* functions in favor or more generic constructs.

In principle yes, except that good generic constructs to specify the output container type have not yet emerged to my knowledge (other than the constructors, of course).

carlobaldassi avatar Oct 22 '16 17:10 carlobaldassi

good generic constructs to specify the output container type

Ref. #16740, work in that direction. Best!

Sacha0 avatar Oct 22 '16 17:10 Sacha0

Now that Array( ... ) has been deprecated, it would be totally possible just to rename collect to Array. Sadly, that would include Array(T, iter), but that's no worse than collect itself.

For Array{T} and Array{T,N} the only unambiguous signature available is Array{T}(iter, dims::Tuple). That might be ok sometimes, but unfortunately Array{T}(iter) is what we'd want most.

One option is to deprecate e.g. Array{T}(1,2,3) to Array{T,3}(1,2,3). Or we could require something like Array{T}(uninitialized, dims) for the current behavior, explicitly requesting uninitialized memory (uninitialized would be a special constant).

JeffBezanson avatar Aug 10 '17 20:08 JeffBezanson

Since collect produces a vector, what about spelling this as Vector(itr) and Vector{T}(itr)?

Edit: I realized the problem with this as soon as I posted it – Vector(n) and Vector{T}(n) alread mean something and integers are iterable. Sad trombone...

StefanKarpinski avatar Aug 11 '17 15:08 StefanKarpinski

ref https://github.com/JuliaLang/julia/issues/15120

vtjnash avatar Sep 05 '17 22:09 vtjnash

Arrays are at this point one of the only collections that doesn't have constructors that take arbitrary iterators of contents to construct with, making them quite the odd one out. Perhaps we should consider deprecating Vector(dims...) and Vector(dims) instead and have blank(T, dims...) and blank(T, dims) (or maybe uninitialized?) functions similar to zeros or ones instead. Then Vector(itr) would be available for this kind of construction.

StefanKarpinski avatar Sep 06 '17 04:09 StefanKarpinski

A blank what? Why isn't Dict() also blank? Should it be blank(Array{T}, dims)? Then I can have blank(MyAbstractArray{T}, args...)?

timholy avatar Sep 06 '17 07:09 timholy

Then I can have blank(MyAbstractArray{T}, args...)?

That's essentially what #16740 implements, for ones, zeros etc.

fredrikekre avatar Sep 06 '17 07:09 fredrikekre

If we decided to get rid of #undef so that creating an uninitialized array no longer makes sense (except maybe as an optimization in very specific cases), we could require choosing between zeros, ones and nulls to construct an array. In Nulls.jl, the latter function creates an Array{Union{Null, T}} filled with nulls, which can now be converted without making a copy to an Array{T} once all nulls have been replaced with valid values. (Cf. https://github.com/JuliaLang/julia/issues/9147.)

nalimilan avatar Sep 06 '17 08:09 nalimilan

The name junk came up on triage this week and everyone seemed to rather like it. As in, if you want a matrix full of zeros, you write zeros(10, 10), if you want a matrix full of ones, you write ones(10, 10), and if you want a matrix full of junk, you write junk(10, 10) 😁. If we went with junk we could reclaim Vector to do what collect currently does.

StefanKarpinski avatar Oct 26 '17 20:10 StefanKarpinski

With that, Array can do what collect currently does. Is the plan to deprecate collect? Just that collect has the ability to choose the right container for an iterator, which does not need to be dense array as it is now. For example an RaggedArray.jl can support (ragged) generators which then can collect to RaggedArrays etc. etc.

mschauer avatar Nov 06 '17 22:11 mschauer

You would just write RaggedArray(itr) for that case.

StefanKarpinski avatar Nov 07 '17 17:11 StefanKarpinski

... if you created itr yourself, of course. But Julia has a rudimentary system in place where iterators inform about the mutable collection they are naturally collected into via traits - and collect(itr) is a way to hook into this system where itr is just any iterable argument to a function something(itr). Replacing collect by Array makes this system more restrictive, retaining both collect and Array leaves it open for extension. In the end this is a design decision and I'll better not keep discussing, but those options are not equivalent.

mschauer avatar Nov 07 '17 18:11 mschauer

I guess we could allow using AbstractArray(itr) to let itr choose the most appropriate container type (with Array as a fallback)?

nalimilan avatar Nov 07 '17 18:11 nalimilan

If a has non-1 indices, currently convert(Vector, a) and collect(a) behave differently: the first throws an error and the second treats a as a list starting with the first item. Which would Vector(a) do?

Sorry I didn't mention this earlier.

timholy avatar Nov 07 '17 22:11 timholy

Which would Vector(a) do?

For general iterable a, Vector(a) should treat a as a list starting with the first item? Best!

Sacha0 avatar Nov 11 '17 19:11 Sacha0