sgn
sgn copied to clipboard
partition nd_experiment table for defined phenotype storage use-case
Expected Behavior
There are use-cases, such as storing phenotypic values, where the nd_experiment table is becoming overloaded in large databases.
I propose adding a new 'nd_experiment_phenotype_bridge' table to provide the defined links (e.g. linking phenotype to stock to project in the case of storing phenotypes).
This table would have foreign keys to the stock, project, phenotype, metadata.md_files, metadata.md_image, and metadata.md_json tables and could have fields of operator and upload_date that are now stored in nd_experimentprop.
This would reduce the number of tables to join over by not using the nd_experiment_* linking tables (e.g. nd_experiment_stock, nd_experiment_project, nd_experiment_phenotype) or the phenome.nd_experiment_md_* tables (e.g. nd_experiment_md_files, nd_experiment_md_images, nd_experiment_md_json) in this very rigidly defined scenario. In summary, this would bring the number of intermediate tables to join over from 7 to 1.
For Bugs:
Environment
Steps to Reproduce
We should use database -level partition of the nd_experiment table.
Also, maybe we can delete the nd_experiment_phenotype linking table? Isn't it useless? (A phenotype will never go to two experiments at the same time, so the relationship is not many to many, but one to many.
don't delete any of the chado tables (e.g. nd_experiment_phenotype) because it will break some GMOD tools and Bio::Chado::Schema We can just not use them. If you add any tables to the public schema we won't be able to easily use Bio::Chado::Schema for queries
More info on db level table partitions: https://www.postgresql.org/docs/10/ddl-partitioning.html
There is a create_date in the nd_epxeriment table that could be used to partition the table. We would create a partition for each year. Data is usually created in year intervals and that would improve performance majorly. Before you implement the different schema, I would definitely first try something like that.
Just revisited this idea today with Lukas. I'll look into partitioning again in more depth.
Amber: We should have a hackathon about this