GRNsight icon indicating copy to clipboard operation
GRNsight copied to clipboard

Attach timestamps to genes

Open dondi opened this issue 1 year ago • 3 comments

Emerging from the discussion of #1120, we have resolved that the gene table should itself have a timestamp. Even if a high percentage of genes will be the same from year to year, producing nearly the same records, we also don’t want to include genes that didn’t exist at a particular time with the interactions from that time

dondi avatar Jan 22 '25 17:01 dondi

@ceciliazaragoza reported that initial queries that include the timestamp now run without errors, but for ZAP1, which regulates itself, the updated query does not return the expected self-regulation adjacency item

@kdahlquist checked the 2025 data and confirmed that ZAP1 does still regulate itself, so @ceciliazaragoza will investigate the query further. @ntran18 can investigate the latest loaded data to see if the issue resides in the database scripts instead

dondi avatar Feb 26 '25 16:02 dondi

The issue with ZAP1 is not from the database queries but from the scripts that getting data from AllianceMine. I checked the script, and we don't get data that have ZAP1 self-regulated. I have to investigate the queries more.

ntran18 avatar Feb 26 '25 17:02 ntran18

See comment on #1120 for next steps

dondi avatar Mar 19 '25 16:03 dondi

Follow-up comment can be seen in https://github.com/dondi/GRNsight/issues/1165#issuecomment-3229108256 for how to proceed with this

dondi avatar Aug 27 '25 17:08 dondi

Need help from @ceciliazaragoza @MilkaZek @Amelie1253 to verify if the code is working properly.

Since this PR modified the schema, you might need to drop the old schema for gene_regulatory_network_with_schema and protein_protein_interactions_with_schema

  1. Connect to local database: psql postgresql://localhost/postgres

  2. Views all schemas: \dn

  3. Check if you have protein_protein_interactions_with_timestamp and gene_regulatory_network_with_timestamp. If so, drop these schemas:

    DROP SCHEMA protein_protein_interactions_with_timestamp CASCADE ; DROP SCHEMA gene_regulatory_network_with_timestamp CASCADE ;

  4. Load new schemas:

    a. Go to database/schema folder b. Create schemas for those two tables: psql -f protein_protein_interactions_with_timestamp_schema.sql postgresql://localhost/postgres psql -f gene_regulatory_network_with_timestamp_schema.sql postgresql://localhost/postgres

  5. Populate new data: a. Go to database/network-database folder b. Populate new data to database: python3 main.py --network all --db_url postgresql://localhost/postgres Notes: Make sure you read the READme.md file in database folder to have all the dependencies needed. Make sure to check if you have virtual environments set up for Python already. If so, please activate the environment first before running this command

  6. Updating data: Lastly, it would be great to check if we can update the database in the future. Please wait for 5 minutes and run the script to populate data to database again. python3 main.py --network all --db_url postgresql://localhost/postgres

Please let me know if you run into any errors. Thank you!

ntran18 avatar Sep 10 '25 17:09 ntran18

I followed @ntran18's instructions to #5 but got these errors. Not sure if I should continue to #6?

Image Image

Amelie1253 avatar Sep 23 '25 18:09 Amelie1253

Ughhh, I don't like this .... There is nothing for us to do. The error is coming from InterMine ....

ntran18 avatar Sep 23 '25 22:09 ntran18

I can't tell, are the errors from the description field? Are we even saving this data in our database?

kdahlquist avatar Sep 23 '25 22:09 kdahlquist

No, we won't save any data to the database until we finish fetching all the data

ntran18 avatar Sep 23 '25 22:09 ntran18

So, we need to determine if this problem is occurring with anyone else right now. If we are downloading all the data and then throwing a lot of it away, does it make sense to modify the query to just grab the stuff we need?

kdahlquist avatar Sep 23 '25 22:09 kdahlquist

I tested it and I have the problem too.

ntran18 avatar Sep 23 '25 22:09 ntran18

We can modify the query to retrieve only the necessary information, but we need to ensure that we test the query (I have done before, and sometimes it failed). I can look more into this issue tmr

ntran18 avatar Sep 23 '25 22:09 ntran18

Closing this because the specific work is done and we have transitioned to overall database integrity checking and loading in #1120

dondi avatar Sep 24 '25 17:09 dondi