ctv icon indicating copy to clipboard operation
ctv copied to clipboard

CRAN task view proposal: Paleontology

Open willgearty opened this issue 2 years ago • 11 comments

Scope

Computational paleontology (or paleobiology) is a thriving field. Gone are the days of just digging up fossils; paleontologists now have the luxury of being able to perform a wide array of complex computational analyses on local and global compendia of fossil occurrence, phylogenetic, and morphological data to study the functional and phylogenetic evolution of organisms, ecosystem function and ecological interactions, paleobiogeographic patterns, and more. Until recently, computational paleontologists have mostly relied on resources designed for evolutionary biologists, ecologists, GISers, and data scientists to accomplish such analyses. However, slowly but surely, resources (including explicit R packages) are being developed to cater to these paleontological tasks.

This CTV brings together a) a collection of traditional packages that are often seen in use in standard computational paleontological workflows, b) more recent paleontological or paleo-adjacent packages that are commonly in use in paleontology, and c) cutting edge paleo-explicit packages that we believe should be adopted by the paleontological community. Therefore, the purpose of this CTV is to provide young and old paleontologists something of a guide to developing a wide variety of computational paleontological workflows. We have included packages (~50 at the moment) that span both the data acquisition/cleaning and analytical components of such workflows, with analyses covering paleoecology, paleobiogeography, phylogenetics, and more (see sections below).

We have excluded many of the most common packages (e.g., tidyverse, sf) because they are often imported by packages in this CTV and they are often covered exhaustively in other CTVs and guides. Further, we have excluded older packages that have been superseded by more robust and/or featureful newer packages (e.g., there are a ~million packages related to ENM, but we have only included a handful). We also recognize that there are many other packages out there that are relevant to or explicitly for paleontology (we originally built a list of ~140 packages that we whittled down to the list below). We excluded most of these packages because we, as a group, had little experience with them or because the packages seemed unfinished or too niche to be useful. However, we'd love to hear from anyone that might have suggestions about other packages to include/exclude. Finally, where applicable, we plan to direct users to other CTVs that overlap in scope (see below).

Packages

Data acquisition

mapast, neotuma2, paleobioDB, rgbif, rgplates, ridigbio, chronosphere

Data cleaning

CoordinateCleaner, fossilbrush, palaeoverse

Data visualization

deeptime, ggtern, ggtree, SDAR, StratigrapheR, tidypaleo, geoChronR, rphylopic

Paleoecology

ade4, dismo, ecospace, ENMeval, ENMTools, fossil, fundiversity, vegan

Paleobiogeography and biodiversity

BAT, Compadre, divDyn, divvy, iNext, sepkoski

Phylogenetics

caper, diversitree, fbdR, FossilSim, geiger, mvMORPH, paleobuddy, paleotree, phytools, strap

Morphology

geomorph, Claddis, dispRity, morphospace

Time series

paleoTS, evoTS, layeranalyzer

Overlap

There is considerable overlap of the scope of this proposed CTV with the scope of other CTVs, including Environmetrics, Phylogenetics, TimeSeries, and Spatial. This stems from the fact that this proposed CTV is subject-oriented, rather than methodology-oriented. This doesn't appear to be an exception, though, given there are already CTVs on other subjects (e.g., ChemPhys). Further, this CTV is focused on which packages in these other CTVs may be used specifically within computational paleontological workflows.

Maintainers

Principal maintainer: @willgearty (also the principal maintainer of the Phylogenetics CTV) Co-maintainers: @AlfioAlessandroChiarenza, @bethany-j-allen, @ChristopherDavidDean, @KEichenseer, @LewisAJones, and @pedrolgodoy (this is a @palaeoverse project)

willgearty avatar Sep 19 '23 22:09 willgearty

Thanks for the proposal, Will @willgearty, and apologies for the slow response! I've finally had a closer look.

I like the proposal but I'm not fully convinced, yet, that the task view will be sufficiently separated from the existing task views. Relatedly, your process of package selection appears to be somewhat subjective - which we try to avoid in task views by adopting clear inclusion/exclusion criteria. Especially, excluding packages that you feel are too old or that you have no experience with, is too subjective.

Hence, I would ask you to establish sufficiently clear rules for inclusion/exclusion of a package, e.g., that it must be explicitly geared towards paleontology or something like that. And rules that would necessitate some individual review process (e.g., to determine whether a package is "useful" or "finished") should be avoided.

Regarding the maintainers: It's great to see an active community proposing a task view. Seven maintainers might still be feasible but maybe a smaller team would be easier to coordinate? Others could still contribute through issues and PRs. Also, I'm not sure whether the palaeoverse community is already so diverse and heterogeneous so that different palaeological views are reflected in it. Or would it help to bring in maybe one person from the outside as well?

I'm also pinging the principal maintainers of the Spatial, SpatioTemporal, and Environmetrics task views here: @rsbivand, @edzer, @gavinsimpson. Maybe you have some thoughts/ideas as well?

zeileis avatar Oct 02 '23 22:10 zeileis

Thanks @zeileis for the helpful comments.

We are certainly open to defining clearer rules for package inclusion/exclusion. I think if we are as exclusive as "explicitly geared towards paleontology", we'll be leaving lots of commonly used packages out (but you are right in that it would then be a very clear rule). However, most, if not all of these excluded packages are already in other task views, so they would at least already be covered there.

We'll give a little time for other folks to provide their thoughts/ideas as well, then we'll look into revising accordingly.

willgearty avatar Oct 03 '23 13:10 willgearty

Hi all! I am also unsure but, as I see it, the overlap with Phylogenetics is also non negligeable (but you know the TV better than I do). In short, what is not clear for me is: "do you have in mind at least some core packages that are very specific to Paleontology and not just to other related topics but useful for Paleontology in you list?" My question is probably quite naive (maybe these are clearly listed in your proposal but I am not able to identify them). These are the packages that, somehow, should be put forward in your TV, mentioning packages that have a larger broad but can be useful for the field afterward. But again, my comment might be completely wrong.

tuxette avatar Oct 04 '23 18:10 tuxette

My deepest apologies (to my co-maintainers and the CTV editors) for the horrible delay in responding to the feedback here. Despite some reservations, we've decided to go for a more conservative approach, as suggested by @zeileis, that includes only packages that are either explicitly designed for paleontology or are explicitly advertised to paleontologists (it appears this is similar to the approach of the Agriculture CTV, for example).

There are many other packages that paleontologists use as part of their workflows, and so, as part of the development of this CTV, we plan to suggest many of these packages to other CTVs where we believe they will be appropriate. We then plan to link out to these CTVs to ensure that users of the Paleontology CTV can find all of the resources that they may need for their highly interdisciplinary work (see below).

@tuxette there isn't a lot of interpackage dependencies in paleontology, so I wouldn't say any packages really stand out as "core" packages. However, if I had to pick a handful of packages based solely on their breadth of use, I would probably say palaeoverse, paleotree, and paleobioDB, but I'm probably biased. I'd be happy to look into download numbers in the future to identify which packages are most widely used before finalizing the list of "core" packages.

Here is an updated proposal for the Paleontology CTV:

Scope

Computational paleontology (or paleobiology) is a thriving field. Gone are the days of just digging up fossils; paleontologists now have the luxury of being able to perform a wide array of complex computational analyses on local and global compendia of fossil occurrence, phylogenetic, and morphological data to study the functional and phylogenetic evolution of organisms, ecosystem function and ecological interactions, paleobiogeographic patterns, and more. Until recently, computational paleontologists have mostly relied on resources designed for evolutionary biologists, ecologists, GISers, and data scientists to accomplish such analyses. However, slowly but surely, resources (including explicit R packages) are being developed to cater to these paleontological tasks.

This CTV brings together the vast majority of paleontological or paleo-adjacent packages that are in use in paleontology. The purpose of this CTV is to provide young and old paleontologists something of a guide to developing a wide variety of computational paleontological workflows. We have included packages (~50 at the moment) that span both the data acquisition/cleaning and analytical components of such workflows, with analyses covering paleoecology, paleobiogeography, phylogenetics, and more (see sections below).

We have excluded many of the most common packages (e.g., tidyverse, sf) because they are often imported by packages in this CTV and they are often covered exhaustively in other CTVs and guides. Further, to keep the list manageable, we also do not include packages that are often used in paleontological workflows but are not explicitly designed for or advertised to paleontologists. Where applicable, we plan to direct users to other CTVs that include many of these packages (and also plan to submit recommendations to these CTVs as necessary).

Packages

Data acquisition

chronosphere, folio, neotoma2, paleobioDB, rgbif, rgplates, ridigbio, rmacrostrat, rpaleoclim

Data cleaning

CoordinateCleaner, fossilbrush, palaeoverse

Data visualization

deeptime, GEOmap, rphylopic, SDAR, StratigrapheR, tidypaleo

Paleoecology

analogue, ecospace, fossil, rioja (and Environmetrics CTV)

Paleobiogeography and biodiversity

Compadre, divDyn, divvy, hespdiv, ppgm, sepkoski (and Spatial CTV)

Phylogenetics

CladeDate, fbdR, FossilSim/FossilSimShiny, paleobuddy, paleotree, RRphylo, strap (and Phylogenetics CTV)

Morphology

morphospace (and Phylogenetics CTV)

Time series

adePEM, astrocron, evoTS, paleoTS, RRatepol (and TimeSeries CTV)

Paleoclimate and Earth System variables

Bchron, cRacle, DAIME, geoChronR, isogeochem, pastclim, sedproxy

Overlap

Only 10 of the proposed packages are included in other CTVs (rgbif, analogue, rioja, FossilSim, paleobuddy, paleotree, strap, paleoTS, deeptime, and GEOmap).

willgearty avatar Jun 24 '24 21:06 willgearty

@zeileis @tuxette Bumping this since the summer is wrapping up. Please let me know what you think of the new proposal!

willgearty avatar Sep 03 '24 13:09 willgearty

@willgearty : Sorry, I completely missed your update of June. I took a look at it today and I think that I understand where this goes. For me, this is convincing but @zeileis has a better global view of CTV and possible overlaps so he might have a different opinion. Also, @rsbivand could have interesting additional insights to provide here maybe? A minor remark is that the titles sometimes give the impression that the corresponding section is slightly out of scope. For instance, the generic title "Time series" is very broad, and until we look at the package list, it is not clear that it doesn't overlap with the TimeSeries task view (also, shouldn't deeptime be included in this section?). I’m not sure exactly how to improve it, but I suspect the time series have a particular focus that could perhaps be reflected more precisely in the title.

tuxette avatar Sep 05 '24 07:09 tuxette

Thanks @tuxette. That section should probably be titled "Time series analysis" to better reflect that those packages are for analyzing time series, not just visualizing them (this is also why deeptime is not included). I can definitely go back through the headings once the package list is finalized to make sure they are succinct and descriptive.

willgearty avatar Sep 09 '24 14:09 willgearty

Will @willgearty, apologies for the late feedback. I agree with Nathalie @tuxette that this goes in the right direction and that the task view is also well-separated from the existing task view topics. I still think that the explanation of the scope needs to be phrased better - but from the current list of packages it's sufficiently clear to me what you want to do. So you can still improve the scope in the next revision.

In short, I endorse this proposal and suggest we let Will and his co-maintainers work out the details. Roger @rsbivand, Dirk @eddelbuettel, Julia @jpiaskowski, and Nathalie @tuxette, if you agree, you can comment below or just react with a thumbs-up.

zeileis avatar Sep 22 '24 22:09 zeileis

This looks great (I endorse). You can also list other relevant task views (e.g Time Series) and how they specifically support paleontological applications, but that is your choice.

jpiaskowski avatar Sep 23 '24 16:09 jpiaskowski

Thanks for the positive feedback, Julia and Dirk. Together with my endorsement you have the necessary three votes (plus Nathalie was also already very positive). So you can move on and elaborate the entire task view.

Do you want to do that first in your own repository and then transfer it later to the cran-task-views organization? Or should I already open cran-task-views/Paleontology/ for you? Both is fine with me.

zeileis avatar Sep 23 '24 20:09 zeileis

Fantastic news, thank you all for the feedback and support!

I have a draft in progress here: https://github.com/palaeoverse/PaleontologyTaskView. I'm happy to keep using that and then transfer it later.

willgearty avatar Sep 23 '24 20:09 willgearty

Our task view draft is now ready for review: https://github.com/palaeoverse/PaleontologyTaskView/blob/main/Paleontology.md. I'd also appreciate feedback from @benmarwick to make sure our two task views remain unique and complementary.

willgearty avatar Oct 18 '24 15:10 willgearty