miceRanger
miceRanger copied to clipboard
Unicode characters in data column names throw an error in naWhere
I have the following data
> head(htc, 2)
25 µL 50 µL 75 µL 100 µL Accession
1: 1.265836 0.02575365 0.1428066 0.2107820 A0A024R6I7
2: NA 0.01566025 0.1481060 0.2069585 A0A075B6K4
> dim(htc)
[1] 269 5
> htc[, colSums(is.na(.SD))]
25 µL 50 µL 75 µL 100 µL Accession
200 0 3 0 0
associated with these naWhere
, varp
and varn
> naWhere[1:4, ]
25 µL 50 µL 75 µL 100 µL Accession
[1,] FALSE FALSE FALSE FALSE FALSE
[2,] TRUE FALSE FALSE FALSE FALSE
[3,] TRUE FALSE FALSE FALSE FALSE
> dim(naWhere)
[1] 269 5
> colSums(naWhere)
25 µL 50 µL 75 µL 100 µL Accession
200 0 3 0 0
> varp <- unique(unlist(vars))
> varp
[1] "50 μL" "75 μL" "100 μL" "Accession" "25 μL" ## maybe apply gtools::mixedsort ?
> varn
[1] "25 μL" "75 μL"
Calculating the leftout columns, throws the following error:
leftOut <- !varp %in% varn & colSums(naWhere[, varp]) > 0
"Error in naWhere[, varp] : subscript out of bounds"
Checking varp
against colnames(naWhere)
:
identical(varp, colnames(naWhere))
FALSE
> intersect(varp, colnames(naWhere))
[1] "Accession"
> varp %in% colnames(naWhere)
[1] FALSE FALSE FALSE TRUE FALSE
> which(varp %in% colnames(naWhere)) ## "Accession" only (FALSE)
[1] 4
> which(colnames(naWhere) %in% varp) ## "Accession" only (FALSE)
[1] 5
It seems to still be working when comparing varp
against varn
:
> !varp %in% varn
[1] TRUE FALSE TRUE TRUE FALSE
The error seems to be caused by the presence of unicode characters in names although it seems to be no challenge for varp
and varn
, as shown by the last code line above. However,
using either seq_along
or base::enc2native
functions seems to remove the error:
leftOut <- !varp %in% varn & colSums(naWhere[, seq(along=varp)]) > 0
> leftOut
25 µL 50 µL 75 µL 100 µL Accession
TRUE FALSE TRUE FALSE FALSE
> varp = enc2native(varp)
> leftOut <- !varp %in% varn & colSums(naWhere[, varp]) > 0
> leftOut
50 µL 75 µL 100 µL Accession 25 µL
FALSE TRUE FALSE FALSE TRUE
Please advise, thank you!