OneHotArrays.jl icon indicating copy to clipboard operation
OneHotArrays.jl copied to clipboard

Add `setindex!`

Open mcabbott opened this issue 8 months ago • 4 comments

First commit has a version of the setindex! method suggested at https://github.com/FluxML/OneHotArrays.jl/issues/6#issuecomment-1061893293, which works like this:

julia> x = onehotbatch(1:3, 0:4)
5×3 OneHotMatrix(::Vector{UInt32}) with eltype Bool:
 ⋅  ⋅  ⋅
 1  ⋅  ⋅
 ⋅  1  ⋅
 ⋅  ⋅  1
 ⋅  ⋅  ⋅

julia> x[1] = 1;  # really this changes two values, OK?

julia> x[1,3] = 0;  # no effect

julia> x
5×3 OneHotMatrix(::Vector{UInt32}) with eltype Bool:
 1  ⋅  ⋅
 ⋅  ⋅  ⋅
 ⋅  1  ⋅
 ⋅  ⋅  1
 ⋅  ⋅  ⋅

julia> copyto!(x, onehotbatch([4,3,2], 0:4));  # calls setindex! 15 times

julia> x
5×3 OneHotMatrix(::Vector{UInt32}) with eltype Bool:
 ⋅  ⋅  ⋅
 ⋅  ⋅  ⋅
 ⋅  ⋅  1
 ⋅  1  ⋅
 1  ⋅  ⋅

julia> x .= 0  # this is bad, illegal state?
5×3 OneHotMatrix(::Vector{UInt32}) with eltype Bool:
 ⋅  ⋅  ⋅
 ⋅  ⋅  ⋅
 ⋅  ⋅  ⋅
 ⋅  ⋅  ⋅
 ⋅  ⋅  ⋅

Allowing you to replace the only 1 by 0 is what makes the copyto! example work, where this is a temporary state. But it also allows x .= 0 as in the last example, which leaves the array in a state we otherwise disallow:

julia> onehotbatch(1:10, 0:4, 0)  # with default
5×10 OneHotMatrix(::Vector{UInt32}) with eltype Bool:
 ⋅  ⋅  ⋅  ⋅  1  1  1  1  1  1
 1  ⋅  ⋅  ⋅  ⋅  ⋅  ⋅  ⋅  ⋅  ⋅
 ⋅  1  ⋅  ⋅  ⋅  ⋅  ⋅  ⋅  ⋅  ⋅
 ⋅  ⋅  1  ⋅  ⋅  ⋅  ⋅  ⋅  ⋅  ⋅
 ⋅  ⋅  ⋅  1  ⋅  ⋅  ⋅  ⋅  ⋅  ⋅

julia> onehotbatch(1:10, 0:4)  # without default
ERROR: Value 10 not found in labels

I forgot about #10, which instead makes writing 0 always an error.

This comment https://github.com/FluxML/OneHotArrays.jl/issues/6#issuecomment-1059842533 suggested only allowing things like the following -- where a single setindex! call can be sure never to leave the array in an illegal state:

julia> x = onehotbatch(1:3, 0:4);

julia> x[:,3] = collect(onehot(0, 0:4))  # setindex! with array eventually calls this PR's scalar setindex! 5 times
5-element Vector{Bool}:
 1
 0
 0
 0
 0

julia> x
5×3 OneHotMatrix(::Vector{UInt32}) with eltype Bool:
 ⋅  ⋅  1
 1  ⋅  ⋅
 ⋅  1  ⋅
 ⋅  ⋅  ⋅
 ⋅  ⋅  ⋅

PR Checklist

  • [ ] Tests are added
  • [ ] Documentation, if applicable

mcabbott avatar Apr 16 '25 22:04 mcabbott

Second commit makes overwriting a 1 with a 0 increment the label. Then copyto! still works, and x .= 0 gives an error rather than illegal answer. Although x .= 1 is a little strange:

julia> x = onehotbatch(1:3, 0:4)
5×3 OneHotMatrix(::Vector{UInt32}) with eltype Bool:
 ⋅  ⋅  ⋅
 1  ⋅  ⋅
 ⋅  1  ⋅
 ⋅  ⋅  1
 ⋅  ⋅  ⋅

julia> x[2] = 0  # changes 2 values, now
0

julia> x
5×3 OneHotMatrix(::Vector{UInt32}) with eltype Bool:
 ⋅  ⋅  ⋅
 ⋅  ⋅  ⋅
 1  1  ⋅
 ⋅  ⋅  1
 ⋅  ⋅  ⋅

julia> x .= 1
5×3 OneHotMatrix(::Vector{UInt32}) with eltype Bool:
 ⋅  ⋅  ⋅
 ⋅  ⋅  ⋅
 ⋅  ⋅  ⋅
 ⋅  ⋅  ⋅
 1  1  1

julia> x .= 0
ERROR: ArgumentError: `setindex!` here would leave the `OneHotArray` without a hot one (in this column)

mcabbott avatar Apr 16 '25 22:04 mcabbott

Codecov Report

Attention: Patch coverage is 81.81818% with 2 lines in your changes missing coverage. Please review.

Project coverage is 95.95%. Comparing base (0ea288d) to head (57f850d). Report is 4 commits behind head on main.

Files with missing lines Patch % Lines
src/array.jl 81.81% 2 Missing :warning:
Additional details and impacted files
@@            Coverage Diff             @@
##             main      #51      +/-   ##
==========================================
- Coverage   96.26%   95.95%   -0.32%     
==========================================
  Files           3        4       +1     
  Lines         134      173      +39     
==========================================
+ Hits          129      166      +37     
- Misses          5        7       +2     

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

:rocket: New features to boost your workflow:
  • :snowflake: Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

codecov[bot] avatar Apr 16 '25 22:04 codecov[bot]

@mcabbott thank you for the PR. What is pending to get it merged?

juliohm avatar Apr 17 '25 12:04 juliohm

Mostly we need to be convinced this is doing the right thing. There were many other behaviours proposed earlier.

Are we OK with x .= 1 not giving an error?

Are there other surprising cases I haven't thought of? Things which are now errors & will become silently wrong answers? This seems a bit like the question of whether setindex!(::Symmetric, ...) should work off-diagonal. But here, it's harder to imagine some generic function writing into a OneHotArray in a way which gives wrong answers. Maybe prod! is an example?

julia> mul!(rand(Bool,3,3), rand(Bool,3,3), rand(Bool,3,3))  # problem for Symmetric
ERROR: InexactError: Bool(2)

julia> x = rand(Bool,4,2)
4×2 Matrix{Bool}:
 0  1
 1  1
 1  0
 1  1

julia> prod!(rand(Bool,4), x)
4-element Vector{Bool}:
 0
 1
 0
 1

julia> prod!(onehotbatch(3, 1:4), x)  # with this PR, often an error, here silenty wrong
4-element OneHotVector(::Array{UInt32, 0}) with eltype Bool:
 ⋅
 ⋅
 ⋅
 1

mcabbott avatar Apr 17 '25 16:04 mcabbott