galaxy icon indicating copy to clipboard operation
galaxy copied to clipboard

Add database data table

Open selten opened this issue 4 years ago • 6 comments

Current data tables do not support locking of the underlying files when additions or removals are made. Due to the multi-process nature of Galaxy deployments, this could cause corruptions to occur within the data table. This PR aims to resolve that issue by storing the data commonly found in data tables inside the database of the Galaxy server.

selten avatar Jun 12 '20 18:06 selten

@blankenberg Do you think you could review this at some point?

selten avatar Aug 28 '20 09:08 selten

We'd like to have a broader conversation around this - it would be a big change - but one thing that came up during the committers meeting is that it would be better if this used a separate sqlite database. There is an example of something like that in https://github.com/galaxyproject/galaxy/tree/dev/lib/galaxy/model/tool_shed_install.

jmchilton avatar Jan 06 '21 15:01 jmchilton

We'd like to have a broader conversation around this - it would be a big change - but one thing that came up during the committers meeting is that it would be better if this used a separate sqlite database. There is an example of something like that in https://github.com/galaxyproject/galaxy/tree/dev/lib/galaxy/model/tool_shed_install.

I think putting this onto a SQLite database might actually negate some of the benefits that are gained by using Galaxy's included database (especially when normally using PostgreSQL). I think going for SQLite would mean that each data table would require its own SQLite file to maintain concurrency of multiple data managers running often and there may still be problems due to the concurrency within Galaxy with opening the SQLite database and writing to it simultaneously with multiple processes. Especially when Galaxy is installed on e.g. NFS (https://www.sqlite.org/howtocorrupt.html#_filesystems_with_broken_or_missing_lock_implementations). Therefore I believe moving to SQLite could result in the same issues as I'm currently experiencing with the files.

By having it inside PostgreSQL (which would be the default for me), I avoid the locking problem and thereby the corruption. Would using a separate schema in PostgreSQL be another option?

selten avatar Jan 06 '21 16:01 selten

By having it inside PostgreSQL (which would be the default for me), I avoid the locking problem and thereby the corruption. Would using a separate schema in PostgreSQL be another option?

An optional separate database connection (so any DBMS) is actually how the mentioned tool_shed_install database is configured, see install_database_connection in config/galaxy.yml.sample .

nsoranzo avatar Jan 06 '21 20:01 nsoranzo

Is there a way to write some integration tests for this - like subclass the test_data_manager.py tests to populate these? Or will these not work with data managers?

jmchilton avatar Apr 07 '21 13:04 jmchilton

Is there a way to write some integration tests for this - like subclass the test_data_manager.py tests to populate these? Or will these not work with data managers?

I added some tests now, these are fully compatible with data managers.

selten avatar Apr 07 '21 22:04 selten