austraits.build
austraits.build copied to clipboard
Refinements to automated substitutions
There are certain circumstances where the automated substitutions code (process.R, line 971) currently requires long lists of substitutions - but maybe could be refined...
Since it only matches entire strings, in circumstances where there are multiple categorical values, one of which needs to be changed, each circumstance with a change to that term needs to be included. For instance, in order to change procumbent
to prostrate
, there are only 6 times you'd have to replace the term through some variant of str_replace
, but 97 different substitutions you'd have to add.
From growth_form branch:
> austraits$traits %>%
+ filter(trait_name == "stem_growth_habit") %>% filter(value == "procumbent") %>% distinct(dataset_id,value)
# A tibble: 6 × 2
dataset_id value
<chr> <chr>
1 Flora_Florabase procumbent
2 Flora_NT procumbent
3 Flora_of_Australia procumbent
4 Flora_PlantNet procumbent
5 Flora_SA procumbent
6 Flora_VicFlora procumbent
> austraits$traits %>%
+ filter(trait_name == "stem_growth_habit") %>% filter(str_detect(value, "procumbent")) %>% distinct(dataset_id,value)
# A tibble: 97 × 2
dataset_id value
<chr> <chr>
1 Flora_Florabase procumbent scrambling
2 Flora_Florabase procumbent spreading
3 Flora_Florabase compact erect procumbent sprawling
4 Flora_Florabase bushy erect procumbent
5 Flora_Florabase bushy procumbent spreading
6 Flora_Florabase erect procumbent spreading
7 Flora_Florabase erect procumbent
8 Flora_Florabase procumbent prostrate
9 Flora_Florabase procumbent
10 Flora_Florabase decumbent procumbent prostrate
# … with 87 more rows
# ℹ Use `print(n = ...)` to see more rows
This gets even harder to fix when the words are entered into the data.csv file in non-alphabetical order, because the output is alphabetical and it is tedious to look up each term in the data.csv file to figure out why the substitution isn't "working".
Could the code be rewritten to replace all instances of a term, rather than an exact string match?
(I also occasionally struggle with capital letters in the input causing substitutions to fail, but this shouldn't be a problem, should it?)