open-cravat icon indicating copy to clipboard operation
open-cravat copied to clipboard

filtersqlite

Open spaul-genetics opened this issue 3 years ago • 2 comments

I was using the filtersqlite as:

oc util filtersqlite main_file.sqlite -f filter.json -s filtered

The main_file.sqlite is 88Mb while the main_file.filtered.sqlite is 232Gb.

What's happening here?

spaul-genetics avatar Mar 03 '22 20:03 spaul-genetics

That is indeed strange. Can you let me know the result of the following commands?

For main_file,
>sqlite3 main_file.sqlite
sqlite>select count(*) from variant;
sqlite>select count(*) from gene;
sqlite>select count(*) from sample;
sqlite>select count(*) from mapping;

For main_file.filtered.sqlite,
>sqlite3 main_file.filtered.qlite
sqlite>select count(*) from variant;
sqlite>select count(*) from gene;
sqlite>select count(*) from sample;
sqlite>select count(*) from mapping;

If there is any difference between the two count(*) for a table, further investigation can be done with

For main_file,
>sqlite3 main_file.sqlite
sqlite>select * from <table name> limit 50;

For main_file.filtered.sqlite,
sqlite>select * from <table name> limit 50;

rkimoakbioinformatics avatar Mar 03 '22 21:03 rkimoakbioinformatics

Here are the counts. From the main sqlite file:

sqlite> .open toy_annot.sqlite
sqlite> select count(*) from variant;
711
sqlite> select count(*) from gene;
318
sqlite> select count(*) from sample;
7112
sqlite> select count(*) from mapping;
711

I tried using util filtersqlite and use filter.json file created using oc gui but ended up with sqlite3.OperationalError: no such table: viewersetup. I used oc report toy_annot.sqlite -t vcf -f toy_filter.json to filter into a vcf file and run oc on the filtered vcf.

The counts on this filtered database

sqlite> .open toy_annot_oc_filtered.sqlite
sqlite> select count(*) from variant;
51
sqlite> select count(*) from gene;
41
sqlite> select count(*) from sample;
332
sqlite> select count(*) from mapping;
51

I also used my SQL code to filter variants using same criteria. This is the count I get:

sqlite> .open toy_annot_sql_filtered.sqlite
sqlite> select count(*) from variant;
602
sqlite> select count(*) from gene;
231
sqlite> select count(*) from sample;
5800
sqlite> select count(*) from mapping;
602

I can provide the toy example data that I used to recreate the issue if necessary.

spaul-genetics avatar May 10 '22 21:05 spaul-genetics

Sorry for the slow response. If you are still interested in this issue, please send the toy example data.

kmoad avatar Mar 09 '23 21:03 kmoad