gcp-variant-transforms
gcp-variant-transforms copied to clipboard
Enable htsget to utilize BigQuery labels (generated by Partitioning)
After landing PR #246 and fixing #248 we need to enable htsget to utilize labels assigned to output BigQuery tables by Partitioning step. This will be the last step to fully materialize the benefit of partitioning the output table in order to reduce the query cost (in term of $ value).
Saman, can you please clarify (or add pointers) on how htsget currently (i.e., before partitioning) integrates with BigQuery output of Variant Transforms?
As I understood, the structure of output tables of VariantTransform can get fairly complicated depending on the partition config file. Consequently, running queries on those tables can become cumbersome.
In order to save user from writing over complicated queries, we would like to offer API so that they write their queries such as: select * from BASE_TABLE_NAME_* where reference_name="chr2"
and then we use the where clause fields to limit the search to a subset of BASE_TABLE_NAMES_* which are labeled with "chr2" variants.