bigquery-examples
bigquery-examples copied to clipboard
optimize all queries using FLATTEN
Restructure them so that the clause within the FLATTEN returns only the columns needed and also applies any applicable WHERE clauses.
For example compare:
SELECT
contig,
position,
reference_bases,
GROUP_CONCAT(alternate_bases) WITHIN RECORD AS alt,
vt,
END,
super_population,
IF(af >= 0.05,
TRUE,
FALSE) AS is_common_variant,
IF(genotype.first_allele > 0
OR genotype.second_allele > 0,
1,
0) AS has_variant
FROM
FLATTEN([google.com:biggene:1000genomes.variants1kG],
genotype) AS samples
JOIN
[google.com:biggene:1000genomes.sample_info] p
ON
samples.genotype.sample_id = p.sample
WHERE
contig = '17'
AND position BETWEEN 41196312
AND 41277500
to a better version of the same thing - notice that the inner most query is now returning a lot less data to the outer query:
SELECT
contig,
position,
reference_bases,
alt,
vt,
END,
super_population,
is_common_variant,
has_variant
FROM
FLATTEN(
(
SELECT
contig,
position,
reference_bases,
GROUP_CONCAT(alternate_bases) WITHIN RECORD AS alt,
vt,
END,
IF(af >= 0.05,
TRUE,
FALSE) AS is_common_variant,
genotype.sample_id,
IF(genotype.first_allele > 0
OR genotype.second_allele > 0,
1,
0) AS has_variant
FROM
[google.com:biggene:1000genomes.variants1kG]
WHERE
contig = '17'
AND position BETWEEN 41196312
AND 41277500),
genotype) AS samples
JOIN
[google.com:biggene:1000genomes.sample_info] p
ON
samples.genotype.sample_id = p.sample
IF(af >= 0.05, TRUE, FALSE) AS is_common_variant
same as
af >= 0.05 AS is_common_variant
?
agreed, can be simplified
I'm pretty sure that pattern is in a few other places too, filed #14
please point out other things like this! Thanks Felipe