crates are more than tables
In the future crates will have data sources that do not resemble the tables that we expect. They will be web api requests to map servers or custom apis.
We need to come up with a solution to take structured data and make it a table for the hashing to work or use a different solution.
We should brainstorm some solutions and maybe prototype a few things.
How much do we want to change (if at all) the Crate API? Could we do something like this for feature service sources?
new Crate(source_name='https://url-to-feature-service', source_workspace=None...)
And then either to a URL regex on source_name or check for source_workspace == None? Or maybe source_workspace should be some sort of constant like FEATURE_SERVICE?
Or maybe source_name should be the name of the feature service and source_workspace should be some sort of base URL? This seems like a bit of a pain.
What could this look like for custom API requests? Do we need to pass some sort of optional get_data parameter?
def get_data():
#: request data
#: build a table and then return the path to it
new Crate(source_name=None, source_workspace=None, get_data=get_data...)
What about creating some basic data readers, storing the common ones in forklift, and then passing the data reader function to the crate at creation time.
Something like?
from forklift.readers import feature_service
new Crate(source_name='https://url-to-feature-service', source_workspace=None, get_data=feature_service...)
what about
from forklift.readers import feature_service, json_api, csv_api
data_reader = lambda: feature_service(url, and, other, options, like, credentials)
json_data_reader = lambda: json_api(url, and, maybe another function for filtering and transforming?)
new Crate(source_name='My data set', source_workspace=None, get_data=data_reader)
new Crate(source_name='Sure sites', source_workspace=None, get_data=json_data_reader)
The reader may also need another function for filtering fields etc. We should brainstorm a data reader model that could work for the scenarios we can think of if you like this solution.
I like it! I wasn't thinking about the issue with source_name not being a valid feature class name since we use it for destination_name by default. I wonder if we could make source_name and source_workspace optional. So that it could look something like this:
new Crate(destination_name='MyFeatureClass', destination_workspace='database.gdb', get_data=data_reader)
Do you think a breaking change to the crate constructor/name lookup would make this nicer? Do you think it would create a lot of work for our pallets? Do you think it would be worth the improvement to the API surface to simplify the crate constructor?
I'm not sure. It would definitely be a lot of work. But the best that I could come up with as for required init params is:
-
source_name -
source_workspace -
destination_workspaceor -
destination_name -
destination_workspace -
get_data
Maybe that will become a mess in the future? Forklift is already a big, hairy beast so I'm open to simplification even if it requires some up-front work.