fishbaseapi icon indicating copy to clipboard operation
fishbaseapi copied to clipboard

Duplicates in genera table

Open sckott opened this issue 9 years ago • 1 comments

E.g.,

library("RMySQL")
library("dplyr")
conn <- src_mysql("slbapp", host = "localhost", user = "root")
gen <- tbl(conn, from = "genera")
gens <- gen %>% select(GEN_NAME) %>% collect() %>% .$GEN_NAME
length(gens)
#> [1] 26261
length(unique(gens))
#> [1] 26164
(lanceola <- gen %>% filter(GEN_NAME == "Lanceola") %>% collect())
lanceola %>% data.frame %>% t
#>                    [,1]                  [,2]                 
#> GenCode            " 7265"               "22248"              
#> GEN_NAME           "Lanceola"            "Lanceola"           
#> AuthorYear         "Say, 1818"           "Hind\xe1k, 1988"    
#> SUBGEN             NA                    NA                   
#> CommonName         NA                    NA                   
#> AUTH               "Say"                 "Hind\xe1k"          
#> QUALIFICATION      NA                    NA                   
#> YR                 "1818"                "1988"               
#> GenusRefno         "84290"               "80701"              
#> Gender             NA                    NA                   
#> FB_Status1         "valid"               "valid"              
#> FB_Status2         NA                    NA                   
#> FB_CURR_GEN        NA                    NA                   
#> FB_NbSpp           NA                    NA                   
#> Famcode            " 331"                "3754"               
#> Subfamily          NA                    NA                   
#> Tribe              NA                    NA                   
#> Syncode            NA                    NA                   
#> CAS_GEN            NA                    NA                   
#> CAS_REF_NO         "0"                   "0"                  
#> STAT_CODE          NA                    NA                   
#> STAT_CODE1         NA                    NA                   
#> LineageID          "0"                   "0"                  
#> CurrentGenusID     "0"                   "0"                  
#> TimeStamp          NA                    NA                   
#> Etymology          NA                    NA                   
#> Distribution       NA                    NA                   
#> Habitat            NA                    NA                   
#> WaterSalinity      NA                    NA                   
#> Marine             "1"                   "0"                  
#> Brackish           "0"                   "0"                  
#> Freshwater         "0"                   "0"                  
#> Comment            NA                    NA                   
#> Diagnosis          NA                    NA                   
#> DspinesMin         NA                    NA                   
#> DspinesMax         NA                    NA                   
#> DsoftRaysMin       NA                    NA                   
#> DsoftRaysMax       NA                    NA                   
#> TotalDsoftRaysMin  NA                    NA                   
#> TotalDsoftRaysMax  NA                    NA                   
#> DsoftRaysBranchMin NA                    NA                   
#> DsoftRaysBranchMax NA                    NA                   
#> AspinesMin         NA                    NA                   
#> AspinesMax         NA                    NA                   
#> AsoftRaysMin       NA                    NA                   
#> AsoftRaysMax       NA                    NA                   
#> TotalAsoftRaysMin  NA                    NA                   
#> TotalAsoftRaysMax  NA                    NA                   
#> AsoftRaysBranchMin NA                    NA                   
#> AsoftRaysBranchMax NA                    NA                   
#> Entered            NA                    "74"                 
#> Dateentered        "2006-08-29 00:00:00" "2009-04-14 00:00:00"
#> Modified           "4"                   "4"                  
#> Datemodified       "2011-01-11 00:00:00" "2011-01-11 00:00:00"
#> Expert             NA                    NA                   
#> Datechecked        NA                    NA                   
#> Designation        NA                    NA                   
#> AuthorRef          NA                    NA                   
#> TaxonRank          NA                    NA                   
#> TS                 "2015-05-11 10:17:24" "2015-05-11 10:17:24"

So at least in this example there are some real differences between this pair of duplicates for the genus Lanceola

The problem is at least this: when we're getting data for the /taxa route, we merge data form the species and genera (and families) tables, and we merge on genus name from species to genera tables because species table has no genus code field at all (AFAICT), -

note: this is only a problem for sealifebase

sckott avatar Mar 14 '16 23:03 sckott

see if this is gone in new database version

sckott avatar Apr 28 '17 17:04 sckott