Automatically load "datapackage.json" when supplied a path to a folder
A lot of frictionless implementations automatically load datapackage.json when reading a directory. It'd be nice to go
read_package("/path/to/packagedir")
instead of always
read_package("/path/to/packagedir/datapackage.json")
Would be useful, but it is a bit more complex than:
- Check if the provided path (or URL) ends with
datapackage.json - If not, append
datapackage.jsonwithfile.path() - Return error if that file could not be found
datapackage.yml and datapackage.yaml files are also valid, so we need to check if the provided file path has this. If not, we'll have to assume a datapackage.json file, potentially missing a datapackage.yml file. Maybe we could check for those files as well before reporting an error.
I think that is why I initially chose the verbose approach, especially since tab completion when writing the path immediately provides feedback on whether a file is present.
I'm curious to see how other frictionless software tackles this.
Ah, makes sense! I was wondering about that yaml stuff -- I saw yaml export functions in frictionless-py but so far have not seen a datapackage.yaml in the wild, so I was running with the assumption that the datapackage.json was the defacto standard.
The collections in the datahub were initially confusing to me to get working with frictionless-r because there was no direct link to the datapackage.json in their file listing (here, for example). The default behavior of their data-cli tool points to the root URL of the package though, and I found adding /datapackage.json did the trick.
I think it's nice (for new users especially) to be able to treat the datapackage as a sort of opaque blob they can load resources from (like tabs in an excel file), without needing to think about the internal structure -- it also facilitates distributing packages as self-contained zip files.
@khusmann Thanks for investigating. datapackage.json is the only valid format according to the specs, but yml/yaml is supported by frictionless-py and it was requested and implemented as a feature for frictionless-r. I think for guessing a file, it's fine to follow frictionless-py (and the specs) and only look for a datapackage.json. I'll try to get your PR included in the next version.
Update:
-
datapackage.json(that name and that extension) is still the standard in v2 for published packages (internal systems can use different names and formats, which is why frictionless-r also supports reading from a provideddatapackage.yaml). I would therefore not implement functionality that starts looking for yaml if json cannot be found. frictionless-py doesn't either. - I prefer not supporting a path to a directory. It's fairly easy to make it work for local directories (with
file.info()$isdir), but what are the expectations for remote directories (likeexample.com/package)? As @khusmann points out that URL could be configured to serve the file or the user might expect the function to look atexample.com/package/datapackage.json. That is 1) two calls and 2) making a call the user didn't request. - I would therefore keep the functionality to providing a file, which is in line with the function argument
read_package(file). I'm not against supporting a path to a zip file, which is 1) a file and 2) aligns more with the concept of a "sort of opaque blob they can load resources from". See #193.
Closing this and associated #158.