Add ability to recursively filter output from get_* functions
E.g.,
get_tsn("Poa")
A lot of results are given...so user filters with regex
# some output printed, prompt given
# user types:
ann
# which filters to strings having "ann"
or by row number(s)
# some output printed, prompt given
# user types:
1:5
# which filters to rows 1 to 5
And this could go on recursively until user exits or ends up with only one result, thus giving back the id itself
thoughts @EDiLD @zachary-foster
@sckott, I just noticed you asked fo thoughts on this. I tried running get_tsn("Poa") and get_tsn('Poa', ask=TRUE, rows = NA), but just got back a single result. Did something change in the last month? I also tried get_tsn('Satyrium'), another ambiguous taxon name, and only got back a single result.
Oh yea, I forgot to share thoughts. I think its a good idea if it does not take too much work to implement. Is it common for there to be that many homonyms for a taxon name? Or perhaps get_tsn("Poa") used to return the taxon ids for all of the species in that genus rather than the genus itself?
@zachary-foster yes, there have been some changes
There are two changes: For get_tsn() we get accepted names by default now, see the accepted parameter
For the case of Poa annua using ITIS data, the API call http://www.itis.gov/ITISWebService/services/ITISService/getITISTermsFromScientificName?srchKey=poa%20annua results in just one name that is accepted, while all others are not accepted, so only one is returned.
Second, we now check for a direct match using grep(). If the regrex match returns only one match, then we just return that one thing, if more than one match, we return all of them and user is given prompt, etc.
Does that makes sense?
@zachary-foster for your second comment:
Hard to say how common multiple names are, depends on the structure of the queries done on the server side of data sources too, some may do a more fuzzy search approach, and some more of a direct match search - I don't think I've tried implementing this yet, so not sure how hard it would be, but worth a try?
@sckott Ok, I understand now. Thanks for the explanation.
I think its worth a try. I dont know if you meant "recursively" literally, but a while (nrow(tsn_df) > 1) {...} loop around the current user prompt code seems like it would work.
In the case of get_tsn, maybe something like (untested code):
if (ask) {
names(tsn_df)[grep(searchtype, names(tsn_df))] <- "target"
tsn_df <- tsn_df[order(tsn_df$target), ]
rownames(tsn_df) <- 1:nrow(tsn_df)
while (nrow(tsn_df) > 1) {
message("\n\n")
print(tsn_df)
message("\nMore than one TSN found for taxon '",
x, "'!\n\n Enter rownumber of taxon (other inputs will return 'NA'):\n")
take <- scan(n = 1, quiet = TRUE, what = "raw")
if (length(take) == 0) {
take <- "notake"
att <- "nothing chosen"
}
if (take %in% seq_len(nrow(tsn_df))) {
take <- as.numeric(take)
message("Input accepted, took taxon '", as.character(tsn_df$target[take]),
"'.\n")
tsn <- tsn_df$tsn[take]
att <- "found"
}
else if (any(grepl(take, tsn_df$target))) {
tsn_df <- tsn_df[grepl(take, tsn_df$target), ]
tsn <- tsn_df$tsn
}
else {
tsn <- NA
mssg(verbose, "\nReturned 'NA'!\n\n")
att <- "not found"
}
}
}
else {
tsn <- NA
att <- "NA due to ask=FALSE"
}
If you are worried about the possiblity of infinite loops caused by while, maybe a for (1:max_prompts) with a if (nrow(tsn_df) == 1) break.
@zachary-foster Right, while loop seems appropriate