dafter icon indicating copy to clipboard operation
dafter copied to clipboard

location of json files

Open chrplr opened this issue 5 years ago • 6 comments

Hello,

Thanks for dafter. I am considering using your framework to provide fetchers for some lexical datasets that I am assembling for the openlexicon project. I liked your json desc files, and I wrote a simple fetcher in R. I would like to give my users the possibility, if they want, to use dafter. Yet, if I am not mistaken, your framework does not seem to easily permit to modify the location where the json files are stored. It took me a bit to discover them in /usr/local/dafter. I think it might be good to have an option to specify an URL for the json files. Maybe I just did not look well enough (?). Another tiny reservation is that the installation instructions require to be root. I had to heck your install script before running it. Best regards

chrplr avatar Apr 24 '19 20:04 chrplr

Hi Christophe,

Thank you very much for your interest in dafter. I took a look at the openlexicon project and I think your project is a good idea!

You're right, for now, it is not possible to specify a URL for the json files. I think it would be a good idea to implement this feature you're proposing.

And concerning the location of the JSON files, I had in mind that, to add a new dataset configuration, the user would have to do a Pull Request to dafter and then update dafter, and never touch the /usr/local/dafter folder. But I may be wrong about that. What do you think would be a better implementation?

vhoulbreque avatar Apr 25 '19 09:04 vhoulbreque

Indeed, I can do a pull request to add some of my databases (once I have settled on definitive names...) to your repo.

But the problem is that I do not expect many of my users to even be able to install dafter (many are Windows users). I have written a fetcher function that, given a json confifuration file, automatically downloads the needed datasets without the user even noticing it. Thus, he can my scripts without bothering about installing the datasets. I think that the database (ie the set of json files) cannot be local as I intend to update it regularily and do not epxect users to know how the do a git pull. But I would nevertheless like to encourage my users to use 'dafter' to manage their local set of datasets.

Maybe dafter could have two new options:

  • '--from-json' that would take as input the json file associated to a dataset
  • '--database' that would take an URL where all the json files are located and can be walked through.

chrplr avatar Apr 26 '19 11:04 chrplr

Here is an embruo of an R script to fetch a dataset. Actually, it could be a valuable addition to dafter to add fetcher functions for developpers.

#! /usr/bin/env Rscript
# Time-stamp: <2019-04-26 13:45:31 [email protected]>

# script to download openlexicon's datasets using dafter json syntax (see https://github.com/vinzeebreak/dafter/)

# example of usage:
# fetch_dataset('Lexique382')

require("rjson")

# TODO: add an option to read the json file name from the command line
# args = commandArgs(trailingOnly=TRUE)  

# localdir where datasets are saved
data.home = Sys.getenv('DATASETS')
if (data.home == "") {
  data.home <-  file.path(path.expand('~'), 'datasets')
}
dir.create(data.home, showWarnings=FALSE, recursive=TRUE)

# remote dir containing the json files describing the datasets
remote <- "https://github.com/chrplr/openlexicon/blob/master/datasets-info/"


fetch_dataset <- function(dataset_id)
{
    json_file <- paste(remote, dataset_id, '.json')
    json_data <- fromJSON(file=json_file)

    for (u in json_data$urls)
    {
        fname <- basename(u$url)
        destname <- file.path(data.home, fname)
        if (!file.exists(destname)) {
            download.file(u$url, destname)
        }
        else {
            warning('The file \"', destname, '\" already exists. Erase it first if you want to update it')
        }
    }
}

chrplr avatar Apr 26 '19 11:04 chrplr

Here is an embruo of an R script to fetch a dataset. Actually, it could be a valuable addition to dafter to add fetcher functions for developpers.

#! /usr/bin/env Rscript
# Time-stamp: <2019-04-26 13:45:31 [email protected]>

# script to download openlexicon's datasets using dafter json syntax (see https://github.com/vinzeebreak/dafter/)

# example of usage:
# fetch_dataset('Lexique382')

require("rjson")

# TODO: add an option to read the json file name from the command line
# args = commandArgs(trailingOnly=TRUE)  

# localdir where datasets are saved
data.home = Sys.getenv('DATASETS')
if (data.home == "") {
  data.home <-  file.path(path.expand('~'), 'datasets')
}
dir.create(data.home, showWarnings=FALSE, recursive=TRUE)

# remote dir containing the json files describing the datasets
remote <- "https://github.com/chrplr/openlexicon/blob/master/datasets-info/"


fetch_dataset <- function(dataset_id)
{
    json_file <- paste(remote, dataset_id, '.json')
    json_data <- fromJSON(file=json_file)

    for (u in json_data$urls)
    {
        fname <- basename(u$url)
        destname <- file.path(data.home, fname)
        if (!file.exists(destname)) {
            download.file(u$url, destname)
        }
        else {
            warning('The file \"', destname, '\" already exists. Erase it first if you want to update it')
        }
    }
}

chrplr avatar Apr 26 '19 11:04 chrplr

I modified the installation. Now, it's a pip install. It's one step towards Windows use!

Concerning the fetcher functions, it's a great idea. For now, I want to do one thing and do it well. In the future, it will be great to make a loader for every dataset and for every Machine Learning framework (Pytorch, Tensorflow, Keras, scikit-learn)

vhoulbreque avatar May 01 '19 18:05 vhoulbreque

Very nice the pip install! Thus users will not need to have any special rights. Good!

Chris

On Wed, May 1, 2019 at 8:15 PM Vincent Houlbrèque [email protected] wrote:

Hi Christian,

I modified the installation. Now, it's a pip install. It's one step towards Windows use!

Concerning the fetcher functions, it's a great idea. For now, I want to do one thing and do it well. In the future, it will be great to make a loader for every dataset and for every Machine Learning framework (Pytorch, Tensorflow, Keras, scikit-learn)

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/vinzeebreak/dafter/issues/80#issuecomment-488364963, or mute the thread https://github.com/notifications/unsubscribe-auth/AALVWMV4GCTX2ZGPJKRO3TTPTHM4TANCNFSM4HIHQDVQ .

--

Christophe Pallier [email protected] INSERM-CEA Cognitive Neuroimaging Lab, Neurospin, bat 145, 91191 Gif-sur-Yvette Cedex, France Tel: 00 33 1 69 08 79 34 Personal web site: http://www.pallier.org Lab web site: http://www.unicog.org

chrplr avatar May 01 '19 18:05 chrplr