fishbaseapi
fishbaseapi copied to clipboard
Duplicates in genera table
E.g.,
library("RMySQL")
library("dplyr")
conn <- src_mysql("slbapp", host = "localhost", user = "root")
gen <- tbl(conn, from = "genera")
gens <- gen %>% select(GEN_NAME) %>% collect() %>% .$GEN_NAME
length(gens)
#> [1] 26261
length(unique(gens))
#> [1] 26164
(lanceola <- gen %>% filter(GEN_NAME == "Lanceola") %>% collect())
lanceola %>% data.frame %>% t
#> [,1] [,2]
#> GenCode " 7265" "22248"
#> GEN_NAME "Lanceola" "Lanceola"
#> AuthorYear "Say, 1818" "Hind\xe1k, 1988"
#> SUBGEN NA NA
#> CommonName NA NA
#> AUTH "Say" "Hind\xe1k"
#> QUALIFICATION NA NA
#> YR "1818" "1988"
#> GenusRefno "84290" "80701"
#> Gender NA NA
#> FB_Status1 "valid" "valid"
#> FB_Status2 NA NA
#> FB_CURR_GEN NA NA
#> FB_NbSpp NA NA
#> Famcode " 331" "3754"
#> Subfamily NA NA
#> Tribe NA NA
#> Syncode NA NA
#> CAS_GEN NA NA
#> CAS_REF_NO "0" "0"
#> STAT_CODE NA NA
#> STAT_CODE1 NA NA
#> LineageID "0" "0"
#> CurrentGenusID "0" "0"
#> TimeStamp NA NA
#> Etymology NA NA
#> Distribution NA NA
#> Habitat NA NA
#> WaterSalinity NA NA
#> Marine "1" "0"
#> Brackish "0" "0"
#> Freshwater "0" "0"
#> Comment NA NA
#> Diagnosis NA NA
#> DspinesMin NA NA
#> DspinesMax NA NA
#> DsoftRaysMin NA NA
#> DsoftRaysMax NA NA
#> TotalDsoftRaysMin NA NA
#> TotalDsoftRaysMax NA NA
#> DsoftRaysBranchMin NA NA
#> DsoftRaysBranchMax NA NA
#> AspinesMin NA NA
#> AspinesMax NA NA
#> AsoftRaysMin NA NA
#> AsoftRaysMax NA NA
#> TotalAsoftRaysMin NA NA
#> TotalAsoftRaysMax NA NA
#> AsoftRaysBranchMin NA NA
#> AsoftRaysBranchMax NA NA
#> Entered NA "74"
#> Dateentered "2006-08-29 00:00:00" "2009-04-14 00:00:00"
#> Modified "4" "4"
#> Datemodified "2011-01-11 00:00:00" "2011-01-11 00:00:00"
#> Expert NA NA
#> Datechecked NA NA
#> Designation NA NA
#> AuthorRef NA NA
#> TaxonRank NA NA
#> TS "2015-05-11 10:17:24" "2015-05-11 10:17:24"
So at least in this example there are some real differences between this pair of duplicates for the genus Lanceola
The problem is at least this: when we're getting data for the /taxa route, we merge data form the species and genera (and families) tables, and we merge on genus name from species to genera tables because species table has no genus code field at all (AFAICT), -
note: this is only a problem for sealifebase
see if this is gone in new database version