datahub
datahub copied to clipboard
Validation script enhancements
TODO List:
- [ ] Update the validation script to use portal API instead of the static json files (dependent on gene table clean up which addresses duplicate entrez id mappings). Also update the unit tests.
- [ ] Remove the dependency on the static
Allowed_data_types.txt
file for genetic alteration combinations. - [ ] Add a rule to not accept
NA
or multiple-
's for indels. - [ ] Add a check to identify the duplicate rows in MAF based on the key cols (Entrez_Gene_Id, Chromosome, Start_Position, End_Position, Variant_Classification, Tumor_Seq_Allele2, HGVSp_Short and Tumor_Sample_Barcode) (Validator doesn't catch this issue but Import fails)
- [ ] Add a check to see if the profile names are normalized. Based on issue: https://github.com/cBioPortal/datahub/issues/1233. If we do not want to enforce external users to the same, add another level of checking constraint. E.g. with some command line argument one could add mskcc specific constraints
samples with mutations but not in sequenced case list (#308)
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.