Attach timestamps to genes
Emerging from the discussion of #1120, we have resolved that the gene table should itself have a timestamp. Even if a high percentage of genes will be the same from year to year, producing nearly the same records, we also don’t want to include genes that didn’t exist at a particular time with the interactions from that time
@ceciliazaragoza reported that initial queries that include the timestamp now run without errors, but for ZAP1, which regulates itself, the updated query does not return the expected self-regulation adjacency item
@kdahlquist checked the 2025 data and confirmed that ZAP1 does still regulate itself, so @ceciliazaragoza will investigate the query further. @ntran18 can investigate the latest loaded data to see if the issue resides in the database scripts instead
The issue with ZAP1 is not from the database queries but from the scripts that getting data from AllianceMine. I checked the script, and we don't get data that have ZAP1 self-regulated. I have to investigate the queries more.
See comment on #1120 for next steps
Follow-up comment can be seen in https://github.com/dondi/GRNsight/issues/1165#issuecomment-3229108256 for how to proceed with this
Need help from @ceciliazaragoza @MilkaZek @Amelie1253 to verify if the code is working properly.
Since this PR modified the schema, you might need to drop the old schema for gene_regulatory_network_with_schema and protein_protein_interactions_with_schema
-
Connect to local database:
psql postgresql://localhost/postgres -
Views all schemas:
\dn -
Check if you have
protein_protein_interactions_with_timestampandgene_regulatory_network_with_timestamp. If so, drop these schemas:DROP SCHEMA protein_protein_interactions_with_timestamp CASCADE ;DROP SCHEMA gene_regulatory_network_with_timestamp CASCADE ; -
Load new schemas:
a. Go to
database/schemafolder b. Create schemas for those two tables:psql -f protein_protein_interactions_with_timestamp_schema.sql postgresql://localhost/postgrespsql -f gene_regulatory_network_with_timestamp_schema.sql postgresql://localhost/postgres -
Populate new data: a. Go to
database/network-databasefolder b. Populate new data to database:python3 main.py --network all --db_url postgresql://localhost/postgresNotes: Make sure you read the READme.md file in database folder to have all the dependencies needed. Make sure to check if you have virtual environments set up for Python already. If so, please activate the environment first before running this command -
Updating data: Lastly, it would be great to check if we can update the database in the future. Please wait for 5 minutes and run the script to populate data to database again.
python3 main.py --network all --db_url postgresql://localhost/postgres
Please let me know if you run into any errors. Thank you!
I followed @ntran18's instructions to #5 but got these errors. Not sure if I should continue to #6?
Ughhh, I don't like this .... There is nothing for us to do. The error is coming from InterMine ....
I can't tell, are the errors from the description field? Are we even saving this data in our database?
No, we won't save any data to the database until we finish fetching all the data
So, we need to determine if this problem is occurring with anyone else right now. If we are downloading all the data and then throwing a lot of it away, does it make sense to modify the query to just grab the stuff we need?
I tested it and I have the problem too.
We can modify the query to retrieve only the necessary information, but we need to ensure that we test the query (I have done before, and sometimes it failed). I can look more into this issue tmr
Closing this because the specific work is done and we have transitioned to overall database integrity checking and loading in #1120