cbioportal
cbioportal copied to clipboard
Sample profile count disparity
Legacy uses gene panel data. When there is NO gene panel (WES?), we get a row per sample because of the join even though both sampleid and panelid will be null! Perhaps we always get a row per sample? And so it doesn't limit the returned set. This sometimes differs from the query of the sample_profile table, which is a subset.
SELECT sample_id, sample_profile.panel_id
FROM sample
INNER JOIN patient ON sample.patient_id = patient.internal_id
INNER JOIN cancer_study ON patient.cancer_study_id = cancer_study.cancer_study_id
LEFT JOIN genetic_profile ON cancer_study.cancer_study_id = genetic_profile.cancer_study_id
LEFT JOIN sample_profile ON sample_profile.genetic_profile_id = genetic_profile.genetic_profile_id
AND sample.internal_id = sample_profile.sample_id
LEFT JOIN gene_panel ON sample_profile.panel_id = gene_panel.internal_id
WHERE genetic_profile.stable_id='brain_cptac_2020_mutations'
For example:
SELECT * from sample_profile
JOIN genetic_profile gp on sample_profile.genetic_profile_id = gp.genetic_profile_id
WHERE gp.stable_id='brain_cptac_2020_mutations'
The question is, which is correct as a measure of whether a given sample is profiled? The legacy discards any information in sample_profile. What i don't understand is how there could EVER by a subset according to the query above? Since it's a left join it would seem there will always be a row per sample whether or not there is a matching gene panel. And yet some profiles can return subset.