sbtools icon indicating copy to clipboard operation
sbtools copied to clipboard

Functions to save and load RData from ScienceBase

Open aappling-usgs opened this issue 10 years ago • 4 comments

In line with #164 plus making SB really R friendly, could write functions that load (and save?) native R data to ScienceBase.

I would use this in mda.streams right away if it were available.

aappling-usgs avatar Feb 16 '16 15:02 aappling-usgs

You thinking just binary Rdata objects or more generic data.frame to CSV/TSV?

lawinslow avatar Feb 16 '16 15:02 lawinslow

I was thinking #164 was about 'more generic data.frame to CSV/TSV', and I think that's also a good idea.

But writing binary data, and keeping it binary, has value of its own. So I'm thinking of this issue as a separate idea.

What are your thoughts on the naming convention for these direct read-in functions(#185, #164, #165, etc.)?

aappling-usgs avatar Feb 16 '16 15:02 aappling-usgs

Hmmm, if we were using R syntax as a parallel, sb_load and sb_save would be a good way to go. Conversely, all of these functions are referring to items, which would push us towards the current sbtools parlance of item_*something*. Just brainstorming here. item_save_rdata, item_load_rdata. item_load_df, item_save_df (or maybe csv).

You have any thoughts?

lawinslow avatar Feb 16 '16 16:02 lawinslow

I like that item_load_rdata could eventually be complemented by item_load_tsv, etc. Possible modifications:

A. It's possible in SB (and sometimes sensible) to store multiple data files within a single item. This could be accommodated in a couple of ways:

  1. prefix every related function with item_file, e.g., item_file_load_rdata(sb_id, ..., names, session=current_session()) to be parallel to item_file_download
  2. open a new line of function names starting with file or data or sbdata, e.g., data_load_rdata(filename, sb_id, ...).

B. Either way, could then go the route of

  1. one function per datatype (data_load_rdata, data_load_wfs, etc.) or
  2. a single function with a flag for the datatype, e.g., data_load(filename, sb_id, data_type='rdata', ...)

B1 v B2 could depend on whether there are specialized arguments (and how many) for each data type.

I'm leaning toward A2 and B1 at the moment.

aappling-usgs avatar Feb 16 '16 16:02 aappling-usgs