GRNsight icon indicating copy to clipboard operation
GRNsight copied to clipboard

Add annotation_type to our database

Open dondi opened this issue 1 year ago • 2 comments

From #1120, we have identified that we should capture the annotation_type field

dondi avatar Jan 22 '25 17:01 dondi

This should be for both PPI and GRN interactions. I can envision a feature where users can select whether they want to look at just manually curated, just high-throughput, or both types of interactions on GRNs and PPIs that are loaded from the database.

kdahlquist avatar Jan 22 '25 18:01 kdahlquist

See comment on #1120 for next steps

dondi avatar Mar 19 '25 16:03 dondi

As a start to the semester w/ cleaning house, let’s focus on completing this verification

dondi avatar Aug 27 '25 17:08 dondi

Overall immediate goal is to have a 2025 database dump that @kdahlquist can validate and that we can send to production

dondi avatar Aug 27 '25 17:08 dondi

I received an error when trying to update my database. It seems to be an issue with the gene table:

Adding data to database.................................................
Data from script-results/source_data.tsv has been successfully populated.
===============================================
Traceback (most recent call last):
  File "/Users/ceciliazaragoza/Documents/LMU-classes/GRNsight/database/network-database/main.py", line 58, in <module>
    main(args.network, args.db_url)
  File "/Users/ceciliazaragoza/Documents/LMU-classes/GRNsight/database/network-database/main.py", line 48, in main
    adding_data_to_databse(network_option, db_url)
  File "/Users/ceciliazaragoza/Documents/LMU-classes/GRNsight/database/network-database/main.py", line 33, in adding_data_to_databse
    GeneDataPopulator(db_url, network_mode).populate_data()
  File "/Users/ceciliazaragoza/Documents/LMU-classes/GRNsight/database/network-database/database_services/populator.py", line 66, in populate_data
    self.process_file(conn, cursor, self.filepath, copy_statement)
  File "/Users/ceciliazaragoza/Documents/LMU-classes/GRNsight/database/network-database/database_services/populator.py", line 54, in process_file
    cursor.copy_expert(sql=copy_statement, file=f)
psycopg2.errors.UniqueViolation: duplicate key value violates unique constraint "gene_pkey"
DETAIL:  Key (gene_id, taxon_id)=(YCLX01W, 559292) already exists.
CONTEXT:  COPY gene, line 2

ceciliazaragoza avatar Sep 03 '25 16:09 ceciliazaragoza

Probably best to drill down with @ntran18 since she was the most recent person to revise the database schema

dondi avatar Sep 03 '25 17:09 dondi

Originally, the schema for the gene table didn't add time_stamp as primary key. Last semester, @ceciliazaragoza was able to populate fresh data to the table, so there wasn't an issue. However, when @ceciliazaragoza run the script again, she is updating the table with new data. Because time stamp wasn't a primary key, duplicate keys error was raised. Thanks @ceciliazaragoza for testing again!

ntran18 avatar Sep 09 '25 20:09 ntran18

@kdahlquist I have uploaded a new folder in Box called GRNsight 2025 AllianceMine New Database Debugging for further offline analysis if needed.

ntran18 avatar Sep 09 '25 20:09 ntran18

Closing this because the specific work is done and we have transitioned to overall database integrity checking and loading in #1120

dondi avatar Sep 24 '25 17:09 dondi