Add annotation_type to our database
From #1120, we have identified that we should capture the annotation_type field
This should be for both PPI and GRN interactions. I can envision a feature where users can select whether they want to look at just manually curated, just high-throughput, or both types of interactions on GRNs and PPIs that are loaded from the database.
See comment on #1120 for next steps
As a start to the semester w/ cleaning house, let’s focus on completing this verification
Overall immediate goal is to have a 2025 database dump that @kdahlquist can validate and that we can send to production
I received an error when trying to update my database. It seems to be an issue with the gene table:
Adding data to database.................................................
Data from script-results/source_data.tsv has been successfully populated.
===============================================
Traceback (most recent call last):
File "/Users/ceciliazaragoza/Documents/LMU-classes/GRNsight/database/network-database/main.py", line 58, in <module>
main(args.network, args.db_url)
File "/Users/ceciliazaragoza/Documents/LMU-classes/GRNsight/database/network-database/main.py", line 48, in main
adding_data_to_databse(network_option, db_url)
File "/Users/ceciliazaragoza/Documents/LMU-classes/GRNsight/database/network-database/main.py", line 33, in adding_data_to_databse
GeneDataPopulator(db_url, network_mode).populate_data()
File "/Users/ceciliazaragoza/Documents/LMU-classes/GRNsight/database/network-database/database_services/populator.py", line 66, in populate_data
self.process_file(conn, cursor, self.filepath, copy_statement)
File "/Users/ceciliazaragoza/Documents/LMU-classes/GRNsight/database/network-database/database_services/populator.py", line 54, in process_file
cursor.copy_expert(sql=copy_statement, file=f)
psycopg2.errors.UniqueViolation: duplicate key value violates unique constraint "gene_pkey"
DETAIL: Key (gene_id, taxon_id)=(YCLX01W, 559292) already exists.
CONTEXT: COPY gene, line 2
Probably best to drill down with @ntran18 since she was the most recent person to revise the database schema
Originally, the schema for the gene table didn't add time_stamp as primary key. Last semester, @ceciliazaragoza was able to populate fresh data to the table, so there wasn't an issue. However, when @ceciliazaragoza run the script again, she is updating the table with new data. Because time stamp wasn't a primary key, duplicate keys error was raised. Thanks @ceciliazaragoza for testing again!
@kdahlquist I have uploaded a new folder in Box called GRNsight 2025 AllianceMine New Database Debugging for further offline analysis if needed.
Closing this because the specific work is done and we have transitioned to overall database integrity checking and loading in #1120