gcp-variant-transforms icon indicating copy to clipboard operation
gcp-variant-transforms copied to clipboard

Enable htsget to utilize BigQuery labels (generated by Partitioning)

Open samanvp opened this issue 7 years ago • 2 comments
trafficstars

After landing PR #246 and fixing #248 we need to enable htsget to utilize labels assigned to output BigQuery tables by Partitioning step. This will be the last step to fully materialize the benefit of partitioning the output table in order to reduce the query cost (in term of $ value).

samanvp avatar May 31 '18 18:05 samanvp

Saman, can you please clarify (or add pointers) on how htsget currently (i.e., before partitioning) integrates with BigQuery output of Variant Transforms?

bashir2 avatar Jun 01 '18 17:06 bashir2

As I understood, the structure of output tables of VariantTransform can get fairly complicated depending on the partition config file. Consequently, running queries on those tables can become cumbersome.

In order to save user from writing over complicated queries, we would like to offer API so that they write their queries such as: select * from BASE_TABLE_NAME_* where reference_name="chr2"

and then we use the where clause fields to limit the search to a subset of BASE_TABLE_NAMES_* which are labeled with "chr2" variants.

samanvp avatar Jun 01 '18 22:06 samanvp