Bogumił Kamiński
Bogumił Kamiński
When a table does not have `nrow` or `ncol` defined I we can add a description in their docstring what they should return. `nothing` seems a reasonable value.
Currently `describe` contract is that it does pretty print the passed object. The contract does not say what the function returns. I propose that `describe` should keep to print what...
Both SplitApplyCombine.jl and DataFrames.jl export `flatten`. I would add it to DataAPI.jl. The question is what docstring it should have? Maybe something like: > Flatten collection of collections into a...
We currently have the following design issue: ``` julia> vcat([1,2,3], PooledArray([1,2,3])) ERROR: MethodError: vcat(::Vector{Int64}, ::PooledVector{Int64, UInt32, Vector{UInt32}}) is ambiguous. julia> vcat(PooledArray([1,2,3]), [1,2,3]) ERROR: MethodError: vcat(::PooledVector{Int64, UInt32, Vector{UInt32}}, ::Vector{Int64}) is ambiguous....
This is WIP to show the changes so that pools can be shared between `PooledArray`s allowing and disallowing missing values. It seems we can have them without much complications. Please...
This fixes the following problem: ``` julia> x = PooledArray(["a", "b"]) 2-element PooledVector{String, UInt32, Vector{UInt32}}: "a" "b" julia> y = resize!(PooledArray(String[]), 2) 2-element PooledVector{String, UInt32, Vector{UInt32}}: #undef #undef julia> copyto!(x,...
If length of pool is much smaller than the number of entries we can run the following (working code): ``` function median_fast(x::PooledVector) n = length(x) p = sortperm(x.pool) counts =...
In some practical cases `SentinelVector` is much slower than `Vector`. For example for data tested in https://bkamins.github.io/julialang/2022/12/23/duckdb.html. We have: ``` julia> summary(posts) "42710197×3 DataFrame" julia> typeof.(eachcol(posts)) 3-element Vector{DataType}: SentinelArrays.ChainedVector{Union{Missing, Int64},...
Things to do: * treat `missing` as a special value that is not pooled, probably with level `0`. This would work the same as in CategoricalArrays.jl; the benefit is that...
CategoricalArrays.jl handles `DataAPI.refarray`, `DataAPI.refvalue`, and `DataAPI.refpool` correctly for views. I propose to add the same for PooledArrays.jl (now views return `nothing` from `DataAPI.refpool`).