open-cravat
open-cravat copied to clipboard
filtersqlite
I was using the filtersqlite as:
oc util filtersqlite main_file.sqlite -f filter.json -s filtered
The main_file.sqlite
is 88Mb while the main_file.filtered.sqlite is 232Gb.
What's happening here?
That is indeed strange. Can you let me know the result of the following commands?
For main_file,
>sqlite3 main_file.sqlite
sqlite>select count(*) from variant;
sqlite>select count(*) from gene;
sqlite>select count(*) from sample;
sqlite>select count(*) from mapping;
For main_file.filtered.sqlite,
>sqlite3 main_file.filtered.qlite
sqlite>select count(*) from variant;
sqlite>select count(*) from gene;
sqlite>select count(*) from sample;
sqlite>select count(*) from mapping;
If there is any difference between the two count(*)
for a table, further investigation can be done with
For main_file,
>sqlite3 main_file.sqlite
sqlite>select * from <table name> limit 50;
For main_file.filtered.sqlite,
sqlite>select * from <table name> limit 50;
Here are the counts. From the main sqlite file:
sqlite> .open toy_annot.sqlite
sqlite> select count(*) from variant;
711
sqlite> select count(*) from gene;
318
sqlite> select count(*) from sample;
7112
sqlite> select count(*) from mapping;
711
I tried using util filtersqlite
and use filter.json file created using oc gui but ended up with sqlite3.OperationalError: no such table: viewersetup
. I used oc report toy_annot.sqlite -t vcf -f toy_filter.json
to filter into a vcf file and run oc on the filtered vcf.
The counts on this filtered database
sqlite> .open toy_annot_oc_filtered.sqlite
sqlite> select count(*) from variant;
51
sqlite> select count(*) from gene;
41
sqlite> select count(*) from sample;
332
sqlite> select count(*) from mapping;
51
I also used my SQL code to filter variants using same criteria. This is the count I get:
sqlite> .open toy_annot_sql_filtered.sqlite
sqlite> select count(*) from variant;
602
sqlite> select count(*) from gene;
231
sqlite> select count(*) from sample;
5800
sqlite> select count(*) from mapping;
602
I can provide the toy example data that I used to recreate the issue if necessary.
Sorry for the slow response. If you are still interested in this issue, please send the toy example data.