datahub icon indicating copy to clipboard operation
datahub copied to clipboard

Validation script enhancements

Open rmadupuri opened this issue 4 years ago • 4 comments

TODO List:

  • [ ] Update the validation script to use portal API instead of the static json files (dependent on gene table clean up which addresses duplicate entrez id mappings). Also update the unit tests.
  • [ ] Remove the dependency on the static Allowed_data_types.txt file for genetic alteration combinations.
  • [ ] Add a rule to not accept NA or multiple -'s for indels.
  • [ ] Add a check to identify the duplicate rows in MAF based on the key cols (Entrez_Gene_Id, Chromosome, Start_Position, End_Position, Variant_Classification, Tumor_Seq_Allele2, HGVSp_Short and Tumor_Sample_Barcode) (Validator doesn't catch this issue but Import fails)
  • [ ] Add a check to see if the profile names are normalized. Based on issue: https://github.com/cBioPortal/datahub/issues/1233. If we do not want to enforce external users to the same, add another level of checking constraint. E.g. with some command line argument one could add mskcc specific constraints

rmadupuri avatar Mar 04 '20 20:03 rmadupuri

samples with mutations but not in sequenced case list (#308)

yichaoS avatar Jun 18 '20 18:06 yichaoS

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

stale[bot] avatar Sep 16 '20 18:09 stale[bot]

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

stale[bot] avatar Jun 06 '21 00:06 stale[bot]

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

stale[bot] avatar Jan 09 '22 07:01 stale[bot]