dpm-js
dpm-js copied to clipboard
Datapackage name differences
The package name in the tree output at the end of running dpm install
and the directory created to hold the downloaded datapackage may not match if the name in the datapackage.json
is not the same as that returned by okfn/datapackage-identifier
's parse function (which uses the URL to work out the name).
e.g.
curl http://example.com/foo/datapackage.json
{
"name": "bar",
...
}
Using dpm install
on this URL will put the files in datapackages/foo
but the tree output at the end of the run will show datapackages/bar
.
Confirmed, for example https://gist.github.com/mchelen/7c972c58f921c58d8c32:
$ dpm install https://gist.githubusercontent.com/mchelen/7c972c58f921c58d8c32/raw/c57c987daf16f11ab4477ccfb76be780a32769f2/datapackage.json
dpm http GET https://gist.githubusercontent.com/mchelen/7c972c58f921c58d8c32/raw/c57c987daf16f11ab4477ccfb76be780a32769f2/data.csv
dpm http 200 https://gist.githubusercontent.com/mchelen/7c972c58f921c58d8c32/raw/c57c987daf16f11ab4477ccfb76be780a32769f2/data.csv
.
└─┬ datapackages
└─┬ blargh
├── datapackage.json
└─┬ data
└── data.csv
$ find .
.
./datapackages
./datapackages/c57c987daf16f11ab4477ccfb76be780a32769f2
./datapackages/c57c987daf16f11ab4477ccfb76be780a32769f2/data.csv
./datapackages/c57c987daf16f11ab4477ccfb76be780a32769f2/datapackage.json
I guess the question is which one should it be, the URL or the name
? I'm thinking name
because someone could host the files on any random directory structure.
I have done some research about this days ago. If we can confirm that the name at datapackage.json
should be the directory name for the install, I can create a PR with the changes.
Thanks @alvaropinot there's actually an open issue on whether name
stays as a required attribute on the datapackage.json
: https://github.com/dataprotocols/dataprotocols/issues/237
@alvaropinot as per @danfowler i think this is something that may change and there are some other juicier items to work on if you are interested :smile: so i'd suggest we leave this one.
@rgrp sure, I'll be glad to help in whatever could be more juicier. Just tell :D
@alvaropinot fantastic! OK how about looking at the "render" stuff e.g. #48. I've been working on the underlying lib so it would be good to sync - can you jump on https://gitter.im/frictionlessdata/chat and ping me ...
Experiencing the same behavior. When installing a package from GitHub, dpm
uses the branch name as the installation folder (master
), overwriting previous resource files if they share the same file names.
Perhaps dpm
could use name
as the installation folder, and default to the last part of the URL if name
is not present.
@inigoflores thanks for reporting. There are two issues here:
- Using
master
as the data package name - that is a definite bug - Using
name
attribute vs package "name" in terms of url
As per above discussion we are considering deprecating name
. Obviously in that case we need to use something like the url or equivalent to create a storage path (a bit like go). We are planning to work on this asap.
In the meantime, we should try and fix the bug with master
- if you can track that down that would be super helpful -- probably an issue in datapackage-identifier
@rgp thanks for your prompt response.
-
Regarding
master
as the package name, it's not a new bug per se. It's just that the URL todatapackage.json
containsmaster
as the last part of the path.dpm install https://raw.githubusercontent.com/codeforspain/ds-empleo/master/datapackage.json dpm install https://raw.githubusercontent.com/codeforspain/ds-organizacion-administrativa/master/datapackage.json
Therefore,
dpm
installs every package underdatapackages/master
. Sorry for not describing the problem better. -
As per deprecating
name
, I've read with interest issue dataprotocols/dataprotocols#237, and I see the case for moving towards an unique ID. However, I would prefer dealing with names than with IDs.
What I was suggesting is to implement the following behavior:
- If
name
is present, use it as the installation folder. - If
name
is missing, useid
instead. - If both are missing (I don't believe this scenario is allowed) use the URL.
Not sure if this makes sense.
I've just discovered by reading the docs at /doc/command-identifier.md that you can actually install a package through its GitHub URL:
dpm install https://github.com/codeforspain/ds-empleo
dpm install https://github.com/codeforspain/ds-organizacion-administrativa
This solves my problem, as packages are installed under the right folder.
Perhaps these instructions should appear on Readme.md (or at least a link).
Thanks!