Which tables need api routes?
@cboettig curious what tables you think would be good to have routes on? or are you mostly interested in replicating what the package already does, and the routes are secondary?
The ones in the package that use HTML scraping are all ones that have been added on request, so they would be good things to support (will have to check which tables they correspond to). Looks like those functions are:
- [x] ecology_from_html.R
- [ ] enviro-climate-range.R
- [x] fao_area.R
- [x] fooditems.R
- [x] getPredators.R
- [x] length-weight-table.R
- [x] metabolism.R
added faoareas in ddce1c2d47a8611c6f6dfac8392ad694ecf83d07
@cboettig Did you want API endpts for each of those above in the list? Or did you actually mean access to those tables, then you'd take care of combining data on the R side?
Right, I was just thinking access to the relevant tables and we would combine on the R side. Less efficient that way but also lighter on the server & just more familiar. We can then revisit doing more preprocessing later to reduce the number of api calls needed & avoid as much joining in R.
On Tue, Feb 3, 2015, 10:56 AM Scott Chamberlain [email protected] wrote:
@cboettig https://github.com/cboettig Did you want API endpts for each of those above in the list? Or did you actually mean access to those tables, then you'd take care of combining data on the R side?
— Reply to this email directly or view it on GitHub https://github.com/ropensci/fishbaseapi/issues/2#issuecomment-72711177.
@cboettig Okay, I'll comment out those methods that do joins, etc. and just give access to the tables needed
Compare this list to the list of tables shown on each species summary page: https://github.com/ropensci/rfishbase/issues/36
As usual, the map between manual, the website, and the SQL tables is hardly 1:1:1, but anyway...
Here's a quick overview of the main tables as extracted from the FishBase Manual.
Key:
- [x] means we've implemented the API endpoint
** means prioritized endpoint
NOMENCLATURE
- [x] The SPECIES Table
- [x] The COMMON NAMES Table
- [x] The SYNONYMS Table
DISTRIBUTION
- [x] The STOCKS Table
- [x] The FAOAREAS Table
- [x] The FAOAREAS REF Table
- [x] The COUNTRIES Table
- [x] The COUNTREF Table
- [x] The INTRODUCTIONS Table
- [x] The OCCURRENCES Table
- [ ] The EXPEDITIONS Table
FAO STATISTICS
- [ ] FAO Statistics
- [ ] FAO Catches
- [ ] FAO Aquaculture
POPULATION DYNAMICS
- [x] The POPCHAR Table
- [x] The LENGTH-WEIGHT Table **
- [x] The LENGTH-FREQUENCY Table
- [x] The LENGTH-LENGTH Table
- [x] The POPGROWTH Table
- [ ] The RECRUITMENT Table
TROPHIC ECOLOGY
- [x] The ECOLOGY Table **
- [x] The FOOD ITEMS Table
- [x] The DIET Table
- [x] The RATION Table
- [x] The POPQB Table
- [x] The PREDATORS Table
REPRODUCTION
ICHTHYOPLANKTON
MORPHOLOGY AND PHYSIOLOGY
- [x] The MORPHOLOGY Table
- [ ] The VISION Table
- [ ] The BRAINS Table
- [x] The OXYGEN Table
- [x] The SWIMMING and SPEED Tables
- [ ] The GILL AREA Table
- [ ] The PROCESSING Table
GENETICS AND AQUACULTURE
Other Tables
Not suggesting that we need to implement all of them, but just putting this down here as a placeholder to help group and prioritize, as well as track what we already have. I'll try and update and annotate the above list to highlight things we want to prioritize or ignore, etc.
The data is pretty messy so the R implementation to return some clean tidy data.frames is always going to lag substantially behind the API anyway. I hope to have a suitable set of backbone functions and then hopefully we can encourage contributions for data cleaning routines associated with a particular call.
@cboettig nice, those table urls don't seem to resolve for me :(
And do the check marks mean that we should have api routes for each table?
@sckott heh, welcome to FishBase. Just keep trying to refresh and I think most of the links should work. good illustration of why scraping the website was a terrible idea.
I've started to check off ones I think we have, and add **'s to the ones I think we should prioritize. (Clearly I should be making better use of Github emoji things here -- edits welcome!)
okay, thx, makes sense
@sckott You raise a good question about naming conventions for endpoints on https://github.com/ropensci/rfishbase/issues/31#issuecomment-73368519
I see your point about having good REST names, but it's a bit tricky here since we don't have control over the naming conventions of the SQL schema & tables but will want the API to be consistent / familiar with them anyway. The FishBASE manual lists six tables in the population dynamics group above, so it might not be obvious that populations refers to popGrowth. We might be best off making the correspondence between the API endpoint and the SQL table 1:1, at least for endpoints which are essentially access to those particular tables. It's not super nice because the way the tables are organized is something of a mess, but I worry that things will just be more opaque if we rename them. does that make sense?
I'm trying to come up with good names for the higher-level functions implemented in the R package, most of which will need more than one SQL call to return something meaningful instead of just a bunch of reference codes to other tables anyway. It would be great to get your input on those names.
It may make sense to make some of those handled on the server end with their own endpoints, which could reduce the number of API queries. Not sure if the increased computation on the server side instead of the client side would outweigh the benefit though, particularly since whatever server this may end up on will probably be relatively underpowered, while doing the manipulation client-side in R with dplyr is pretty efficient. (If the API were powering a website where the computational power client-side was way less than what it is in R, doing these computations on the server may make more sense).
@cboettig is this updated with all endponts avail. in the api?
I think I've kept it up-to-date so far
On Wed, Feb 25, 2015, 4:39 PM Scott Chamberlain [email protected] wrote:
@cboettig https://github.com/cboettig is this updated with all endponts avail. in the api?
— Reply to this email directly or view it on GitHub https://github.com/ropensci/fishbaseapi/issues/2#issuecomment-76098773.
@cboettig k, just curious