Category to integer mapping: bikeshedding session
After merging #52, we should provide a tool that constructs a mapping from the levels of a PooledDataArray to the integers. This function should make clear that the mapping is ad hoc and not related to the underlying representation of the data.
Should we call it levelsmap?
I believe this is a very basic device that is useful out of this package. For example, it may also be useful in constructing contingency tables (see https://github.com/JuliaStats/Stats.jl/issues/32).
What about we implement this in Stats.jl and thus provide such support to other packages that may also want it?
That works for me. What interface will we be using? Something like `levelsmap(["a", "b", "A", "A"]) -> ["a" => 1, "b" => 2, "A" => 3]?
FWIW, there's already indexmap in Stats.jl. (see https://github.com/JuliaStats/Stats.jl#miscelleneous-functions)
We don't want to use that as our canonical numbering, right? That could produce numbers that are spaced very unevenly.
Oh, yes ... we can then add a levelmap method for this?
That seems right to me. Should I submit a PR to discuss implementation details?
Sure.
Think about it more. Probably, we may want a data structure that maintain cross-reference between levels & indexes.
Something along this line?
immutable LevelMap{T}
levels::Vector{T} # index -> level
indmap::Dict{T, Int} # level -> index
end
with some functions to make doing the translation convenient.
Not sure. For involved indexing operations, it seems like you'd want to maintain all of the indices for each level since that would make it much easier to repeat the levels calculation on subsets of the data.