Jovian_archive
Jovian_archive copied to clipboard
Addition of PostgreSQL and individual tables in order to collect data from all Jovian runs
This FeatureRequest was originally posted by @florianzwagemaker on the internal GitLab repo. Transferring it to GitHub.
In order to collect data from all Jovian runs, a sql-based database is necessary in order to efficiently store larger sets of information.
PostgreSQL is a (relatively) easy database which is able to handle medium to very large datasets without compromising resource usage.
Additionally, Postgres is able to scale adequately to the data-needs of the Jovian project while maintaining flexibility, this allows the database and it's structure to be modified when necessary.
This is something that other databases such as MySQL/MariaDB or NoSQL(Cassandra,MongoDB or Couchbase) do not provide up to the same degree which makes them harder to use either for larger datasets or in terms of flexibility.
A PostgreSQL database can be accompanied in Jovian with an easy to use linux tool named Csvkit
With Csvkit it will be possible to insert data into the PSQL database through shell commands in a snakemake rule.
An example of data insertion into PSQL through one of Csvkit's commands is csvsql --db postgresql:/// --insert /path/to/unclassified_data.csv