gnomeR icon indicating copy to clipboard operation
gnomeR copied to clipboard

Minor bugs in `create_gene_binary()`

Open akriti21 opened this issue 1 year ago • 4 comments

Creating one issue with all feedback as not sure what all we want to actually work on for next steps..

1 As of now, the function will break if the user enters samples = 'a data frame' instead of samples = 'a vector' and gives an error - None of your selected samples have alterations in your data..
We can check the class of the samples argument first and if it's not a vector, throw a more specific error saying it must be a vector? @karissawhiting - Just checked, this is done!

2 No warning/message/error if user enters duplicate sample IDs.. Function returns unique rows for mutation data but not for CNA and fusion @hfuchs5

3 The argument mutation type works fine to filter for somatic if mut_type = s (case sensitive). So if you enter mut_type = ‘s’ or mut_type = 'somatic_only' it works fine and returns all data. However, if I enter 'somatic only', it throws an error because the "_" is missing. @karissawhiting - also done. use rlang::arg_match() if you don't want partial matching!

4 mutationStatus= 'NA' or blank cases are included when using mut_type = somatic_only @karissawhiting

I think this is the correct behavior. These are almost always actually somatic they just don't have a matched normal, so I think we should assume somatic, throw the warning, and include them in somatic only as is currently being done.

5. Message below appears regardless of actual data

! 7 mutations have NA or blank in mutation status column instead of 'SOMATIC' or 'GERMLINE'. These were assumed to be 'SOMATIC' and were retained in the resulting binary matrix. @hfuchs5

Example - Convert to binary matrix using 10 sample IDs from mut

mut_valid_sample_ids<-unique(gnomeR::mutations$sampleId)[1:10] sub <- create_gene_binary(sample=mut_valid_sample_ids, mutation=gnomeR::mutations)

Merge and see mutation status for the 10 samples.. need to rename variable in sub so can restrict mutations data to the 10 samples sub_c <- sub %>% rename("sampleId" = "sample_id") mut_10 <- merge(x=sub_c,y=mut, by="sampleId")

Check the 10 samples using code below. Don’t see any missing cases. However, on creating the data ‘sub’, get a msg that there were 7 cases with missing or NA mutation status. mut_10 %>% select(mutationStatus) mut_10 %>% select(mutationStatus) %>% unique()

akriti21 avatar Apr 27 '23 13:04 akriti21