gnomeR
gnomeR copied to clipboard
Minor bugs in `create_gene_binary()`
Creating one issue with all feedback as not sure what all we want to actually work on for next steps..
✅1 As of now, the function will break if the user enters samples = 'a data frame' instead of samples = 'a vector'
and gives an error - None of your selected samples
have alterations in your data..
We can check the class of the samples argument first and if it's not a vector, throw a more specific error saying it must be a vector?
@karissawhiting - Just checked, this is done!
2 No warning/message/error if user enters duplicate sample IDs.. Function returns unique rows for mutation data but not for CNA and fusion @hfuchs5
✅3 The argument mutation type works fine to filter for somatic if mut_type = s (case sensitive).
So if you enter mut_type = ‘s’ or mut_type = 'somatic_only' it works fine and returns all data. However, if I enter 'somatic only', it throws an error because the "_" is missing.
@karissawhiting - also done. use rlang::arg_match()
if you don't want partial matching!
✅4 mutationStatus= 'NA' or blank cases are included when using mut_type = somatic_only @karissawhiting
I think this is the correct behavior. These are almost always actually somatic they just don't have a matched normal, so I think we should assume somatic, throw the warning, and include them in somatic only as is currently being done.
✅5. Message below appears regardless of actual data
! 7 mutations have NA
or blank in mutation status column instead of 'SOMATIC' or 'GERMLINE'. These were assumed to be 'SOMATIC' and were retained in the resulting binary matrix.
@hfuchs5
Example - Convert to binary matrix using 10 sample IDs from mut
mut_valid_sample_ids<-unique(gnomeR::mutations$sampleId)[1:10] sub <- create_gene_binary(sample=mut_valid_sample_ids, mutation=gnomeR::mutations)
Merge and see mutation status for the 10 samples.. need to rename variable in sub so can restrict mutations data to the 10 samples sub_c <- sub %>% rename("sampleId" = "sample_id") mut_10 <- merge(x=sub_c,y=mut, by="sampleId")
Check the 10 samples using code below. Don’t see any missing cases. However, on creating the data ‘sub’, get a msg that there were 7 cases with missing or NA mutation status. mut_10 %>% select(mutationStatus) mut_10 %>% select(mutationStatus) %>% unique()