Tidy up developer functions
These have changed a lot recently and I want to get my head clearly around these functions, and how we package them together and document them. I'm starting this issue to flag it but will continue to develop the notes here.
Functions affected:
-
object2id() -
object2fixed() -
pattern2id() -
pattern2fixed() -
index_types() -
index()(akalocate())
Should also address #2062
So what I meant in this PR is that especially for object2*() and pattern2*() functions, these are important building blocks of our functionality that could be useful by other developers (or by our future selves or other quanteda developers). These function in similar ways but it's not clear which should be used when.
NOTE: We don't necessarily need these tidied up before v3 release, since they are internal, but I think that tidying them up could help met the goal you expressed of promoting core functions for developers. For instance if we create a developer vignette and talk about some of our internal functions and structures.
> pattern <- list(c("^a$", "^b"), c("c"), c("d"))
> types <- c("A", "AA", "B", "BB", "BBB", "C", "CC")
> pattern2fixed(pattern, types, "regex", case_insensitive = TRUE)
[[1]]
[1] "A" "B"
[[2]]
[1] "A" "BB"
[[3]]
[1] "A" "BBB"
[[4]]
[1] "C"
[[5]]
[1] "CC"
> object2fixed(pattern, types, "regex", case_insensitive = TRUE)
$`^a$ ^b`
[1] "A" "B"
$`^a$ ^b`
[1] "A" "BB"
$`^a$ ^b`
[1] "A" "BBB"
$c
[1] "C"
$c
[1] "CC"
I wonder why we do not consolidate them in pattern2*() since the input objects are also valid inputs listed in ?pattern.
Also the 2id functions are like an lapply(match(). The return for ?match():
match: An integer vector giving the position in table of the first match if there is a match, otherwisenomatch. Would it make more sense to describe the function more this way? and potentially name it to reflect the similarity with match?
object2*() takes various objects like dictionary and collocations. It depends on pattern2*().
*2id is the underlying function that returns positions in the type vector, so pattern2id is the mother of all the functions.