dafter
dafter copied to clipboard
location of json files
Hello,
Thanks for dafter. I am considering using your framework to provide fetchers for some lexical datasets that I am assembling for the openlexicon project. I liked your json desc files, and I wrote a simple fetcher in R. I would like to give my users the possibility, if they want, to use dafter. Yet, if I am not mistaken, your framework does not seem to easily permit to modify the location where the json files are stored. It took me a bit to discover them in /usr/local/dafter. I think it might be good to have an option to specify an URL for the json files. Maybe I just did not look well enough (?). Another tiny reservation is that the installation instructions require to be root. I had to heck your install script before running it. Best regards
Hi Christophe,
Thank you very much for your interest in dafter. I took a look at the openlexicon project and I think your project is a good idea!
You're right, for now, it is not possible to specify a URL for the json files. I think it would be a good idea to implement this feature you're proposing.
And concerning the location of the JSON files, I had in mind that, to add a new dataset configuration, the user would have to do a Pull Request to dafter and then update dafter, and never touch the /usr/local/dafter
folder. But I may be wrong about that. What do you think would be a better implementation?
Indeed, I can do a pull request to add some of my databases (once I have settled on definitive names...) to your repo.
But the problem is that I do not expect many of my users to even be able to install dafter (many are Windows users). I have written a fetcher function that, given a json confifuration file, automatically downloads the needed datasets without the user even noticing it. Thus, he can my scripts without bothering about installing the datasets. I think that the database (ie the set of json files) cannot be local as I intend to update it regularily and do not epxect users to know how the do a git pull. But I would nevertheless like to encourage my users to use 'dafter' to manage their local set of datasets.
Maybe dafter could have two new options:
- '--from-json' that would take as input the json file associated to a dataset
- '--database' that would take an URL where all the json files are located and can be walked through.
Here is an embruo of an R script to fetch a dataset. Actually, it could be a valuable addition to dafter to add fetcher functions for developpers.
#! /usr/bin/env Rscript
# Time-stamp: <2019-04-26 13:45:31 [email protected]>
# script to download openlexicon's datasets using dafter json syntax (see https://github.com/vinzeebreak/dafter/)
# example of usage:
# fetch_dataset('Lexique382')
require("rjson")
# TODO: add an option to read the json file name from the command line
# args = commandArgs(trailingOnly=TRUE)
# localdir where datasets are saved
data.home = Sys.getenv('DATASETS')
if (data.home == "") {
data.home <- file.path(path.expand('~'), 'datasets')
}
dir.create(data.home, showWarnings=FALSE, recursive=TRUE)
# remote dir containing the json files describing the datasets
remote <- "https://github.com/chrplr/openlexicon/blob/master/datasets-info/"
fetch_dataset <- function(dataset_id)
{
json_file <- paste(remote, dataset_id, '.json')
json_data <- fromJSON(file=json_file)
for (u in json_data$urls)
{
fname <- basename(u$url)
destname <- file.path(data.home, fname)
if (!file.exists(destname)) {
download.file(u$url, destname)
}
else {
warning('The file \"', destname, '\" already exists. Erase it first if you want to update it')
}
}
}
Here is an embruo of an R script to fetch a dataset. Actually, it could be a valuable addition to dafter to add fetcher functions for developpers.
#! /usr/bin/env Rscript
# Time-stamp: <2019-04-26 13:45:31 [email protected]>
# script to download openlexicon's datasets using dafter json syntax (see https://github.com/vinzeebreak/dafter/)
# example of usage:
# fetch_dataset('Lexique382')
require("rjson")
# TODO: add an option to read the json file name from the command line
# args = commandArgs(trailingOnly=TRUE)
# localdir where datasets are saved
data.home = Sys.getenv('DATASETS')
if (data.home == "") {
data.home <- file.path(path.expand('~'), 'datasets')
}
dir.create(data.home, showWarnings=FALSE, recursive=TRUE)
# remote dir containing the json files describing the datasets
remote <- "https://github.com/chrplr/openlexicon/blob/master/datasets-info/"
fetch_dataset <- function(dataset_id)
{
json_file <- paste(remote, dataset_id, '.json')
json_data <- fromJSON(file=json_file)
for (u in json_data$urls)
{
fname <- basename(u$url)
destname <- file.path(data.home, fname)
if (!file.exists(destname)) {
download.file(u$url, destname)
}
else {
warning('The file \"', destname, '\" already exists. Erase it first if you want to update it')
}
}
}
I modified the installation. Now, it's a pip install
. It's one step towards Windows use!
Concerning the fetcher functions, it's a great idea. For now, I want to do one thing and do it well. In the future, it will be great to make a loader for every dataset and for every Machine Learning framework (Pytorch, Tensorflow, Keras, scikit-learn)
Very nice the pip install! Thus users will not need to have any special rights. Good!
Chris
On Wed, May 1, 2019 at 8:15 PM Vincent Houlbrèque [email protected] wrote:
Hi Christian,
I modified the installation. Now, it's a pip install. It's one step towards Windows use!
Concerning the fetcher functions, it's a great idea. For now, I want to do one thing and do it well. In the future, it will be great to make a loader for every dataset and for every Machine Learning framework (Pytorch, Tensorflow, Keras, scikit-learn)
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/vinzeebreak/dafter/issues/80#issuecomment-488364963, or mute the thread https://github.com/notifications/unsubscribe-auth/AALVWMV4GCTX2ZGPJKRO3TTPTHM4TANCNFSM4HIHQDVQ .
--
Christophe Pallier [email protected] INSERM-CEA Cognitive Neuroimaging Lab, Neurospin, bat 145, 91191 Gif-sur-Yvette Cedex, France Tel: 00 33 1 69 08 79 34 Personal web site: http://www.pallier.org Lab web site: http://www.unicog.org