julia icon indicating copy to clipboard operation
julia copied to clipboard

make Base.setindex() public

Open aplavin opened this issue 1 year ago • 15 comments

setindex() is a widely used well-documented function, it should be marked public, right?

Not sure what's the right way to make a function with multiple methods public, please correct me if I'm wrong here!

aplavin avatar Jul 15 '24 17:07 aplavin

I'm missing the point of this PR, as far as I can tell this is already public:

% julia +nightly -q
julia> Base.ispublic(Base, :setindex!)
true

giordano avatar Jul 15 '24 17:07 giordano

Note the difference:

julia> Base.ispublic(Base, :setindex!)
true

julia> Base.ispublic(Base, :setindex)
false

aplavin avatar Jul 15 '24 17:07 aplavin

Not sure what's the status here. Should I do some update of this PR or ...?

aplavin avatar Jul 21 '24 19:07 aplavin

If setindex is to be made public, it should be documented, i.e., its doc string should be included into the docs. Thus the "needs docs" label.

nsajko avatar Jul 21 '24 21:07 nsajko

I'm not familiar with documentation generation, could use some pointers on where to update things :) Of course, assuming that this change is approved.

aplavin avatar Jul 22 '24 00:07 aplavin

Something like

diff --git a/doc/src/base/base.md b/doc/src/base/base.md
index d7e7fff..f09128a 100644
--- a/doc/src/base/base.md
+++ b/doc/src/base/base.md
@@ -137,6 +137,7 @@ Core.typeassert
 Core.typeof
 Core.tuple
 Base.ntuple
+Base.setindex
 Base.objectid
 Base.hash
 Base.finalizer

nsajko avatar Jul 22 '24 08:07 nsajko

If the function is to be made public, we'd need to decide if it's just for tuples, i.e, will adding methods for other collection types be allowed. Perhaps we could instead define something like setindex_tuple(t::Tuple, v, i) = setindex(t, v, i) and then make only setindex_tuple public.

Adding new methods would negatively affect abstract return type inference in the extreme case where the type of the input collection is unknown (inferred as Any), quite possibly requiring the ecosystem to add some type asserts for performance.

nsajko avatar Jul 22 '24 08:07 nsajko

(Removing my review request as @nsajko has got this handled 😉)

ararslan avatar Jul 23 '24 01:07 ararslan

Not sure what you exactly the issue with extending setindex and what exactly you suggest, @nsajko. Currently, setindex is defined (& documented) in Base for tuples and namedtuples. It's also used by lots of packages, with many more methods defined for setindex – eg, StaticArrays do that.

As the semantics of the function is pretty clear, and it's already extensively used in the wild (including new method definitions), what specific issues do you see with marking it public?


Perhaps we could instead define something like setindex_tuple(t::Tuple, v, i) = setindex(t, v, i) and then make only setindex_tuple public.

Tbh, I don't see a reason to do that. Defining separate functions for every type, instead of methods of the same functions, goes directly against Julia multiple dispatch design and practice.

aplavin avatar Jul 24 '24 09:07 aplavin

I'm referring to the "world splitting" optimization, see the Base.Experimental.@max_methods doc string. I agree that Base.setindex should probably be available for all collection types. But, it's good to consider the possible downsides upfront, seeing as, in a fresh REPL session, Base.setindex has just three methods - below the max_methods default.

EDIT: I only mentioned this issue for completeness. IMO it's secondary to proper documentation.

nsajko avatar Jul 24 '24 13:07 nsajko

FWIW, given that setindex is indeed already used in many packages for some reason, and its interface seems OK, I now agree that it should probably be made public.

nsajko avatar Jul 24 '24 13:07 nsajko

I'm referring to the "world splitting" optimization, <...>

Still not sure how these considerations are relevant to marking setindex as public. I don't propose adding any more methods in Base here, although, I would like that as a completely separate independent PR (https://github.com/JuliaLang/julia/issues/43155, https://github.com/JuliaLang/julia/issues/54508, https://github.com/JuliaLang/julia/pull/43338, https://github.com/JuliaLang/julia/pull/33495). Marking it as public should just reflect the practical status quo: this function is widely used & overloaded.

Btw, a lot of high-performance code uses StaticArrays, and with them loaded one has at least 6 setindex methods.

aplavin avatar Jul 24 '24 13:07 aplavin

bump...

aplavin avatar Jul 30 '24 10:07 aplavin

bump :)

aplavin avatar Aug 12 '24 05:08 aplavin

bump... anything else I should do here?

aplavin avatar Aug 21 '24 11:08 aplavin

and another bump

aplavin avatar Sep 02 '24 20:09 aplavin

bump, anyone?..

aplavin avatar Sep 19 '24 10:09 aplavin

should be a tiny change, any issues with it?..

aplavin avatar Oct 03 '24 12:10 aplavin

triage approves. @aplavin add a news entry and we can merge.

oscardssmith avatar Nov 07 '24 02:11 oscardssmith

As the semantics of the function is pretty clear

Actually the semantics are not at all clear. The behavior of Base.setindex is very different for tuples and named tuples. For tuples, an out-of-bounds index throws a BoundsError, whereas for named tuples an out-of-bounds index extends the named tuple.

julia> Base.setindex((1, ), 10, 2)
ERROR: BoundsError: attempt to access Tuple{Int64} at index [2]
Stacktrace:
 [1] setindex(x::Tuple{Int64}, v::Int64, i::Int64)
   @ Base ./tuple.jl:57
 [2] top-level scope
   @ REPL[34]:1

julia> Base.setindex((a=1, ), 10, :b)
(a = 1, b = 10)

Ideally a generic function should have one docstring (per arity). In other words, there should only be one meaning for a generic function. Consider the Base.setindex! docstring, which makes no mention of the input types:

"""
    setindex!(collection, value, key...)

Store the given value at the given key or index within a collection. The syntax `a[i,j,...] =
x` is converted by the compiler to `(setindex!(a, x, i, j, ...); x)`.

...
"""

(I know there is a second docstring for setindex!, but let's ignore that for the purposes of this discussion.) Note how the docstring provides a single generic definition that can work for many different collection types.

I don't think Base.setindex in its current form should be made public. I see two possible paths forward:

  1. Change the behavior of Base.setindex(nt::NamedTuple, v, i) so that it errors on an out-of-bounds index. Then the docstring for Base.setindex can be the following:

    """
        setindex(x, v, i)
    
    Create a shallow copy of collection `x` with the value at index `i` set
    to `v`. Throw a `BoundsError` when `i` is out-of-bounds.
    """
    function setindex end
    
    
  2. Make two public functions (names can be bikeshed), something like the following:

    """
        setindex(x, v, i)
    
    Create a shallow copy of collection `x` with the value at index `i` set
    to `v`. Throw a `BoundsError` when `i` is out-of-bounds.
    """
    function setindex end
    
    """
        setkey(x, v, k)
    
    Create a shallow copy of collection `x` with a new key `k` set
    to `v`. If `k` is already in the keys of `x`, `v` replaces
    the old value.
    """
    function setkey end
    

If we decide to add setindex and setkey to Base, then we might as well export them.

CameronBieganek avatar Nov 12 '24 15:11 CameronBieganek

Comparison to setindex! does not support that argument, actually:

julia> setindex!([1], 10, 2)
ERROR: BoundsError: attempt to access 1-element Vector{Int64} at index [2]
Stacktrace:
 [1] setindex!(A::Vector{Int64}, x::Int64, i1::Int64)
   @ Base ./array.jl:1021
 [2] top-level scope
   @ REPL[6]:1

julia> setindex!(Dict(:a => 1), 10, :b)
Dict{Symbol, Int64} with 2 entries:
  :a => 1
  :b => 10

So the consequence would be to write a docstring for setindex that does not talk about index bounds at all for consistency's sake?

martinholters avatar Nov 13 '24 09:11 martinholters

My comparison to the setindex! docstring was only meant to demonstrate what a generic docstring looks like, but I guess it was not the best example. The iterate docstring is a better example:

    iterate(iter [, state]) -> Union{Nothing, Tuple{Any, Any}}

Advance the iterator to obtain the next element. If no elements remain, `nothing` should
be returned. Otherwise, a 2-tuple of the next element and the new iteration state
should be returned.

Note that the docstring provides a single definition for iterate regardless of the input types.

But you bring up an interesting point. In my previous comment I explicitly ignored the second docstring for setindex!. So, setindex! actually has at least three distinct behaviors, depending on the input type:

Throw a bounds error for an array:

julia> x = [1, 2];

julia> x[3] = 10
ERROR: BoundsError: attempt to access 2-element Vector{Int64} at index [3]

Extend a dictionary:

julia> d = Dict(:a => 1);

julia> d[:b] = 2
2

Set multiple indices at once for an array:

julia> a = [1, 2, 3, 4];

julia> a[[2, 3]] = [10, 20];

julia> a
4-element Vector{Int64}:
  1
 10
 20
  4

So, setindex! is a very non-generic function. I guess that's ok, since setindex! is a bit special due to the square-bracket syntax sugar and it's importance for array indexing. Not every function in Julia has to be a generic function. There are some notable exceptions to the goal of having generic functions, like constructors and convert, where the exact behavior depends on the input type. But I think we should generally prefer to make Base Julia functions generic whenever possible. Base.setindex does not have syntax sugar and it is not ubiquitously used like setindex!, so I think it makes sense to make it generic (and we can add setkey if desired).

So the consequence would be to write a docstring for setindex that does not talk about index bounds at all for consistency's sake?

No, that's not the consequence. Please take a look at the example docstrings that I provided in my previous post.

CameronBieganek avatar Nov 13 '24 16:11 CameronBieganek

Sorry, forgot about this for a while!

triage approves. @aplavin add a news entry and we can merge.

Added.

aplavin avatar Jan 04 '25 15:01 aplavin

It looks like my concern about the generic definition of setindex still has not been addressed. What happens when an out-of-bounds key is provided? Does it throw an error or does it extend the collection? I think the only plausible solution is to make two functions, setindex and setkey (name can be bikeshed), as I described above.

I think setindex should be defined for tuples but not for named tuples, and setkey should be defined for named tuples but not tuples. In other words, this really needs to be a method error: setkey((1, ), 2, 1_000_000).

CameronBieganek avatar Jan 08 '25 04:01 CameronBieganek

I think the only reasonable way to define setindex semantics is to follow setindex! as closely as possible. Anything else would be very confusing. AFAICT, this is the case with the current setindex implementation: throws an OOB error for tuples like setindex! for arrays, adds a new element for named tuples like setindex! for dicts.

This function is well-defined and documented, naturally leading to a widespread use in many packages. I can imagine

I think setindex should be defined for tuples but not for named tuples

would cause quite some breakage if implemented. That's one of the reasons to mark it public – because it effectively is already.

I haven't seen new triage comments here, so assuming they still agree with the PR.

aplavin avatar Jan 08 '25 12:01 aplavin

I haven't seen new triage comments here, so assuming they still agree with the PR.

My concerns were raised after that triage call, so triage may not have considered the issues that I raised.

I think the only reasonable way to define setindex semantics is to follow setindex! as closely as possible. Anything else would be very confusing.

I agree there is some logic to that, but setindex! is very special because it has syntax sugar that is ubiquitously used. Because square bracket indexing is used so ubiquitously, it's not surprising that the various uses do not behave in a consistent, generic way. However, Base.setindex is not special and it is not nearly as ubiquitous. It is overloaded in StaticArrays and Accessors. Where else?

setindex does not even need to be defined in Base. In Accessors.jl you can define your own setindex function with any behavior you like (though I would still prefer a generic definition).

That's one of the reasons to mark it public – because it effectively is already.

We shouldn't be encouraging the usage of Base internals. This PR sets a bad precedent. Once someone starts using a Base internal they can open an issue requesting that the internal is marked public because "it's used widely in the ecosystem."


Consider the following two code chunks. The first one uses setindex and the second one uses setkey, as I defined them above. Note that there is no mention of the types of the values that are used in either example. Yet the examples can be understood to be correct due to the generic definitions of setindex and setkey.

setindex example:

b = try
    setindex(a, v, i)
catch e
    e isa BoundsError ? a : rethrow()
end

setkey example:

The following will never throw a BoundsError:

b = setkey(a, v, k)

Of course, it could throw other errors due to type mismatches of the inputs. For ultimate clarity, the following should probably be added to the docstring for setkey (and similarly for setindex):

"""
    setkey(x, v, k)

...

The following must be true, or else an error will be thrown:

- `typeof(k) <: keytype(x)`
- `typeof(v) <: valtype(x)`
"""

I think setindex should be defined for tuples but not for named tuples

would cause quite some breakage if implemented.

Ultimately, Base.setindex is not public, so Base has no backwards compatibility obligations for Base.setindex. Though I agree that it would make sense for both setkey and setindex (as I have defined them) to work on named tuples. The important thing is that setkey(("a", ), "b", 1_000_000) should be a method error.

Actually, named tuples are a weird beast... Arguably the second Base.setindex call below should work. (In my formulation, the first call would be a setkey call and the second call would be a setindex call, and they would both work.)

julia> nt = (a = 1, b = 2);

julia> Base.setindex(nt, 100, :c)
(a = 1, b = 2, c = 100)

julia> Base.setindex(nt, 100, 2)
ERROR: MethodError: no method matching setindex(::@NamedTuple{a::Int64, b::Int64}, ::Int64, ::Int64)

CameronBieganek avatar Jan 09 '25 00:01 CameronBieganek

Note that setindex is not some obscure function, it's used and sometimes overloaded in many packages. I don't really have more arguments for making it public than

  • it's a widely used well-documented function with clear and accepted semantics
  • if setindex exists, it should be as consistent with setindex! as possible to avoid confusion; and that's exactly the current behavior.

It's clear that you disagree. Neither of us can merge this PR anyways, so now it's up to Julia developers.

aplavin avatar Jan 09 '25 18:01 aplavin

Semiannual bump

aplavin avatar Jul 08 '25 22:07 aplavin

Sorry this has languished for so long, @aplavin. Looks like this will need to be rebased since there's a merge conflict. @oscardssmith, since you commented on behalf of triage when the group discussed this PR last year, can you speak to whether any of the concerns raised in this thread (after triage occurred, so would have been raised independently) were discussed at the time, or if there were any takeaways beyond "merge it"?

ararslan avatar Jul 09 '25 07:07 ararslan

To be fully honest, I don't really remember the triage discussion here.

oscardssmith avatar Jul 11 '25 03:07 oscardssmith