hydrus CLI: add command to upload data from a JSON or CSV file

I'm submitting a

[X] feature request.

Current Behaviour:

To bulk-load data in the server the only possibility is to send PUT requests to the endpoint

Expected Behaviour:

Data can be loaded by running a command, pointing to a local text file.

Mar 10 '18 17:03 Mec-iS

@Mec-iS Is it okay to add pandas as an additional dependency ? This will help read from a variety of formats (xlsx,pkl,json,csv,tsv, SQL queries and a lot more)

However it also has a number of modules for processing data that we don't need, (keeping in mind that hydrus was meant to be "lightweight")

Mar 11 '18 17:03 py-ranoid

Always use standard library tools. Standard library has a csv and json package. The use of pandas is not justified at the moment.

Mar 11 '18 17:03 Mec-iS

When you say data, you mean instances of objects that the API serves right?

I think it could be handy to have some text file having some preloaded data that can load instances right away. But we would need to define the format of such a file.

Mar 11 '18 19:03 chrizandr

Yeah with data I mean instances/objects that are actually served by the interface (the one we store using the PUT method on the Items endpoint).

But we would need to define the format of such a file.

Using standard formats is always the way to go. Probably would be better to support, beside JSON, also the different serialization formats for triples for backward compatibility with older Knowledge Bases.

Mar 11 '18 21:03 Mec-iS

@vaibhavchellani ^^

Jul 06 '18 05:07 xadahiya

@xadahiya @Mec-iS I think that this issue is not solved yet? I can continue work from https://github.com/HTTP-APIs/hydrus/pull/168 which was closed. Could you tell the reason that https://github.com/HTTP-APIs/hydrus/pull/168 PR was closed?

Apr 07 '20 13:04 sameshl

this is something needed @sameshl , also ping @vedangj044 as I think he is working on it

Apr 07 '20 13:04 Mec-iS

@sameshl #168 looks good but it is specific to a particular case where we have the data for a particular resource that too if the data doesn't contain any abstract property. I am working on a generic preloading script that maps the column names of a CSV file to the resource names of a Hydra Doc. Starting with resources with 0 abstract property we load the data using the crud.insert function. Then for resources where we need to link to properties, we first need to get the resource ID using get function. (this is not yet implemented) Likewise, the data can be loaded from any CSV file.

Also, a broader solution should focus on all types of files even Rational Databases. We can discuss 2 approaches for RDB

Generic preloading script which automatically maps data to hydrus-generated database.
Introducing a keyword named SQL in the Hydra vocab. This keyword would contain the SQL query needed to get data from the database and populate the hydrus-generated DB. Example: SQL: SELECT * FROM ARTIST; Now hydrus would run this query and populate it's database accordingly. I think this approach is much safer from a developer's point of view.

Apr 07 '20 16:04 vedangj044

Why are batch endpoints useful? How can we add them to the existing REST API?

Feb 07 '21 09:02 Asmi8

@Mec-iS Before working on this issue should we set up a Database Config_file and db_parser.py for hydrus in a separate PR. The workflow will be easier and more manipulation can be done with the database as we will have a unified method to connect the DB.

from db_parser import get_db_url
DB_URL = get_db_url()

May 25 '21 08:05 Purvanshsingh

Everything about the code is in the code. For now we only use file database as SQLite. Ask your colleagues in Slack for generic directions.

May 25 '21 08:05 Mec-iS

Is anyone working on this, at the moment?

Nov 03 '21 07:11 Vyvy-vi

I want to work on this issue!

Nov 09 '21 13:11 Akash-Kumar-Sen

hydrus hydrus copied to clipboard

CLI: add command to upload data from a JSON or CSV file

I'm submitting a

Current Behaviour:

Expected Behaviour:

hydrus
hydrus copied to clipboard