bigsnpr
bigsnpr copied to clipboard
fix rsid-based `snp_asGeneticPos()`
spline()
will complain about non-monotonous inputs otherwise. Tested on the provided list of hm3 variants map.rds (https://ndownloader.figshare.com/files/25503788)
Hi Florian
Thanks for your ongoing work on the package, we're using ldpred2 quite a bit (manuscripts forthcoming).
I was glad to see the added functionality for estimating genetic distances based on rsIDs, as I'd been working on hg38 builds for a while and wondering about the hg19-based genetic maps.
However, when trying to evaluate and implement the rsID-based mapping, I ran into some issues. I'll try and demonstrate with a hopefully reproducible example based on the list of hapmap3 variants you provide on figshare. I hope you agree that this is a suitable example dataset as a) it is already available online and b) it's your recommended set of variants that any analysis should filter on anyway.
library(dplyr)
library(ggplot2)
library(bigsnpr)
# Feel free to change these two. This is the fully reproducilbe version.
test_variants <- readRDS(url("https://ndownloader.figshare.com/files/25503788"))
gen_map_dir <- "./genetic_map_testing"
if (! dir.exists(gen_map_dir)) {
dir.create(gen_map_dir)
}
dist_rsid <- snp_asGeneticPos(test_variants$chr,
test_variants$pos_hg38,
rsid = test_variants$rsid,
dir = gen_map_dir)
# Error in { : task 1 failed - "'y' must be increasing or decreasing"
A look around the function code for snp_asGeneticPos()
revealed that this error message is likely to come from the stats::spline()
function used to extrapolate genetic distances for variants whose rsID could not be matched.
There is a bit more code below, but the gist of it is that adding ties = "ordered"
to the stats::spline()
call solves the issue. You're welcome to read on, but I'm afraid there is no big eureka moment, I basically stumbled upon the solution and think it works as a fix.
Code continued: As the error message is a bit vague, I was curious whether the error could be narrowed down to just one chromosome (snp_asGeneticPos()
splits the chromosomes internally anyway):
for (current_chromosome in 1:22) {
cat("Current bigsnpr function - Chromosome", current_chromosome, "\n")
sub_df <- filter(test_variants, chr == current_chromosome)
try(expr =
{
sub_df$dist_rsid <- snp_asGeneticPos(sub_df$chr,
sub_df$pos_hg38,
rsid = sub_df$rsid,
dir = gen_map_dir)
}
)
}
# Current bigsnpr function - Chromosome 1
# Error in { : task 1 failed - "'y' must be increasing or decreasing"
# Current bigsnpr function - Chromosome 2
# Error in { : task 1 failed - "'y' must be increasing or decreasing"
# Current bigsnpr function - Chromosome 3
# Current bigsnpr function - Chromosome 4
# Current bigsnpr function - Chromosome 5
# Current bigsnpr function - Chromosome 6
# Error in { : task 1 failed - "'y' must be increasing or decreasing"
# Current bigsnpr function - Chromosome 7
# Error in { : task 1 failed - "'y' must be increasing or decreasing"
# Current bigsnpr function - Chromosome 8
# Error in { : task 1 failed - "'y' must be increasing or decreasing"
# Current bigsnpr function - Chromosome 9
# Current bigsnpr function - Chromosome 10
# Error in { : task 1 failed - "'y' must be increasing or decreasing"
# Current bigsnpr function - Chromosome 11
# Current bigsnpr function - Chromosome 12
# Error in { : task 1 failed - "'y' must be increasing or decreasing"
# Current bigsnpr function - Chromosome 13
# Current bigsnpr function - Chromosome 14
# Current bigsnpr function - Chromosome 15
# Error in { : task 1 failed - "'y' must be increasing or decreasing"
# Current bigsnpr function - Chromosome 16
# Current bigsnpr function - Chromosome 17
# Current bigsnpr function - Chromosome 18
# Current bigsnpr function - Chromosome 19
# Current bigsnpr function - Chromosome 20
# Current bigsnpr function - Chromosome 21
# Current bigsnpr function - Chromosome 22
The error does not occur every single time, but only for chromosomes 1, 2, 6, 7, 8, 10, 12, and 15. Since the error is caused by input vectors for stats::spline()
not being monotonous, my first attempt at fixing things was to sort the genomic positions before running snp_asGeneticPos()
:
for (current_chromosome in 1:22) {
cat("Current bigsnpr function with sorted input - Chromosome", current_chromosome, "\n")
sub_df <- filter(test_variants, chr == current_chromosome) %>% arrange(pos_hg38)
try(expr =
{
sub_df$dist_rsid <- snp_asGeneticPos(sub_df$chr,
sub_df$pos_hg38,
rsid = sub_df$rsid,
dir = gen_map_dir)
}
)
}
# Current bigsnpr function with sorted input - Chromosome 1
# Error in { : task 1 failed - "'y' must be increasing or decreasing"
# Current bigsnpr function with sorted input - Chromosome 2
# Error in { : task 1 failed - "'y' must be increasing or decreasing"
# Current bigsnpr function with sorted input - Chromosome 3
# Current bigsnpr function with sorted input - Chromosome 4
# Current bigsnpr function with sorted input - Chromosome 5
# Current bigsnpr function with sorted input - Chromosome 6
# Error in { : task 1 failed - "'y' must be increasing or decreasing"
# Current bigsnpr function with sorted input - Chromosome 7
# Error in { : task 1 failed - "'y' must be increasing or decreasing"
# Current bigsnpr function with sorted input - Chromosome 8
# Error in { : task 1 failed - "'y' must be increasing or decreasing"
# Current bigsnpr function with sorted input - Chromosome 9
# Current bigsnpr function with sorted input - Chromosome 10
# Error in { : task 1 failed - "'y' must be increasing or decreasing"
# Current bigsnpr function with sorted input - Chromosome 11
# Current bigsnpr function with sorted input - Chromosome 12
# Error in { : task 1 failed - "'y' must be increasing or decreasing"
# Current bigsnpr function with sorted input - Chromosome 13
# Current bigsnpr function with sorted input - Chromosome 14
# Current bigsnpr function with sorted input - Chromosome 15
# Error in { : task 1 failed - "'y' must be increasing or decreasing"
# Current bigsnpr function with sorted input - Chromosome 16
# Current bigsnpr function with sorted input - Chromosome 17
# Current bigsnpr function with sorted input - Chromosome 18
# Current bigsnpr function with sorted input - Chromosome 19
# Current bigsnpr function with sorted input - Chromosome 20
# Current bigsnpr function with sorted input - Chromosome 21
# Current bigsnpr function with sorted input - Chromosome 22
No changes here. I figured I'd have to dig a bit deeper, so I tried to directly run the internal code from snp_asGeneticPos()
and see what's going on:
# Rename variables to match names inside snp_asGeneticPos() function
pos <- test_variants$pos_hg38
rsid <- test_variants$rsid
for (chr in 1:22) {
cat("Code snippet from bigsnpr function - Chromosome", chr, "\n")
try(expr =
{
ind.chr <- which(test_variants$chr == chr)
# Files need to be downloaded! The earlier `snp_asGeneticPos()` call with `dir = gen_map_dir`
# should have taken care of the download
map.chr <- bigreadr::fread2(paste0(gen_map_dir, "/chr", chr, ".OMNI.interpolated_genetic_map"), showProgress = FALSE)
# by rsid, lines 296-305
ind <- match(rsid[ind.chr], map.chr$V1)
new_pos <- map.chr$V3[ind]
indNA <- which(is.na(ind))
if (length(indNA) > 0) {
pos.chr <- pos[ind.chr]
new_pos[indNA] <- suppressWarnings(
stats::spline(pos.chr, new_pos, xout = pos.chr[indNA], method = "hyman")$y)
}
}
)
}
# Code snippet from bigsnpr function - Chromosome 1
# Error in stats::spline(pos.chr, new_pos, xout = pos.chr[indNA], method = "hyman") :
# 'y' must be increasing or decreasing
# Code snippet from bigsnpr function - Chromosome 2
# Error in stats::spline(pos.chr, new_pos, xout = pos.chr[indNA], method = "hyman") :
# 'y' must be increasing or decreasing
# Code snippet from bigsnpr function - Chromosome 3
# Code snippet from bigsnpr function - Chromosome 4
# Code snippet from bigsnpr function - Chromosome 5
# Code snippet from bigsnpr function - Chromosome 6
# Error in stats::spline(pos.chr, new_pos, xout = pos.chr[indNA], method = "hyman") :
# 'y' must be increasing or decreasing
# Code snippet from bigsnpr function - Chromosome 7
# Error in stats::spline(pos.chr, new_pos, xout = pos.chr[indNA], method = "hyman") :
# 'y' must be increasing or decreasing
# Code snippet from bigsnpr function - Chromosome 8
# Error in stats::spline(pos.chr, new_pos, xout = pos.chr[indNA], method = "hyman") :
# 'y' must be increasing or decreasing
# Code snippet from bigsnpr function - Chromosome 9
# Code snippet from bigsnpr function - Chromosome 10
# Error in stats::spline(pos.chr, new_pos, xout = pos.chr[indNA], method = "hyman") :
# 'y' must be increasing or decreasing
# Code snippet from bigsnpr function - Chromosome 11
# Code snippet from bigsnpr function - Chromosome 12
# Error in stats::spline(pos.chr, new_pos, xout = pos.chr[indNA], method = "hyman") :
# 'y' must be increasing or decreasing
# Code snippet from bigsnpr function - Chromosome 13
# Code snippet from bigsnpr function - Chromosome 14
# Code snippet from bigsnpr function - Chromosome 15
# Error in stats::spline(pos.chr, new_pos, xout = pos.chr[indNA], method = "hyman") :
# 'y' must be increasing or decreasing
# Code snippet from bigsnpr function - Chromosome 16
# Code snippet from bigsnpr function - Chromosome 17
# Code snippet from bigsnpr function - Chromosome 18
# Code snippet from bigsnpr function - Chromosome 19
# Code snippet from bigsnpr function - Chromosome 20
# Code snippet from bigsnpr function - Chromosome 21
# Code snippet from bigsnpr function - Chromosome 22
This confirmed that the error comes from stats::spline()
. Looking at the function code for stats::spline()
, I was able to see how monotony is checked:
stats::spline
# function (x, y = NULL, n = 3 * length(x), method = "fmm", xmin = min(x),
# xmax = max(x), xout, ties = mean)
# {
# method <- pmatch(method, c("periodic", "natural", "fmm",
# "hyman"))
# if (is.na(method))
# stop("invalid interpolation method")
# x <- regularize.values(x, y, ties, missing(ties))
# y <- x$y
# x <- x$x
# nx <- length(x)
# if (is.na(nx))
# stop(gettextf("invalid value of %s", "length(x)"), domain = NA)
# if (nx == 0)
# stop("zero non-NA points")
# if (method == 1L && y[1L] != y[nx]) {
# warning("spline: first and last y values differ - using y[1] for both")
# y[nx] <- y[1L]
# }
# if (method == 4L) {
# dy <- diff(y)
# if (!(all(dy >= 0) || all(dy <= 0)))
# stop("'y' must be increasing or decreasing")
# }
# if (missing(xout))
# xout <- seq.int(xmin, xmax, length.out = n)
# else n <- length(xout)
# if (n <= 0L)
# stop("'spline' requires n >= 1")
# xout <- as.double(xout)
# z <- .Call(C_SplineCoef, min(3L, method), x, y)
# if (method == 4L)
# z <- spl_coef_conv(hyman_filter(z))
# list(x = xout, y = .Call(C_SplineEval, xout, z))
# }
# <bytecode: 0x7fc5a7b29fd8>
# <environment: namespace:stats>
The relevant snippet is
if (method == 4L) {
dy <- diff(y)
if (!(all(dy >= 0) || all(dy <= 0)))
stop("'y' must be increasing or decreasing")
}
However, the values for x
and y
provided to stats::spline()
are parsed through the function call x <- regularize.values(x, y, ties, missing(ties))
. I found it really difficult to find documentation for this function, but either way, running the whole thing with ties = "ordered"
added fixes the errors:
tracked_results <- tibble(chr = as.integer(),
pos = as.integer(),
rsid = as.character(),
na_rsid = as.logical(),
dist_rsid = as.numeric())
for (chr in 1:22) {
cat("Fixed snippet from bigsnpr function - Chromosome", chr, "\n")
try(expr =
{
ind.chr <- which(test_variants$chr == chr)
map.chr <- bigreadr::fread2(paste0(gen_map_dir, "/chr", chr, ".OMNI.interpolated_genetic_map"), showProgress = FALSE)
# by rsid, lines 296-305
ind <- match(rsid[ind.chr], map.chr$V1)
new_pos <- map.chr$V3[ind]
indNA <- which(is.na(ind))
if (length(indNA) > 0) {
pos.chr <- pos[ind.chr]
new_pos[indNA] <- suppressWarnings(
stats::spline(pos.chr, new_pos, xout = pos.chr[indNA], method = "hyman", ties = "ordered")$y)
}
# Log results and missing rsids
tracked_results <- bind_rows(tracked_results, tibble(chr = chr,
pos = pos[ind.chr],
rsid = rsid[ind.chr],
na_rsid = is.na(ind),
dist_rsid = new_pos))
}
)
}
# Fixed snippet from bigsnpr function - Chromosome 1
# Fixed snippet from bigsnpr function - Chromosome 2
# Fixed snippet from bigsnpr function - Chromosome 3
# Fixed snippet from bigsnpr function - Chromosome 4
# Fixed snippet from bigsnpr function - Chromosome 5
# Fixed snippet from bigsnpr function - Chromosome 6
# Fixed snippet from bigsnpr function - Chromosome 7
# Fixed snippet from bigsnpr function - Chromosome 8
# Fixed snippet from bigsnpr function - Chromosome 9
# Fixed snippet from bigsnpr function - Chromosome 10
# Fixed snippet from bigsnpr function - Chromosome 11
# Fixed snippet from bigsnpr function - Chromosome 12
# Fixed snippet from bigsnpr function - Chromosome 13
# Fixed snippet from bigsnpr function - Chromosome 14
# Fixed snippet from bigsnpr function - Chromosome 15
# Fixed snippet from bigsnpr function - Chromosome 16
# Fixed snippet from bigsnpr function - Chromosome 17
# Fixed snippet from bigsnpr function - Chromosome 18
# Fixed snippet from bigsnpr function - Chromosome 19
# Fixed snippet from bigsnpr function - Chromosome 20
# Fixed snippet from bigsnpr function - Chromosome 21
# Fixed snippet from bigsnpr function - Chromosome 22
For a quick sanity check, I plot hg38 positions versus the estimated genetic distances:
tracked_results %>%
ggplot(aes(x = pos, y = dist_rsid)) +
geom_point() +
facet_wrap(facets = vars(chr), scales = "free", nrow = 5) +
labs(x = "hg38 position", y = "distance based on rsIDs")
There is something weird going on in chromosome 7, this is due to the liftover to hg38 returning nucleotide positions for an alternative scaffold (I see the same in my own liftover results of your previous hm3 list). While this looks off when plotting against hg38 positions, the genetic distances here are correct I'd say. We can have a look at these outliers:
test_variants %>%
mutate(liftover_diff = abs(pos_hg38 - pos)) %>%
arrange(desc(liftover_diff))
## A tibble: 1,054,330 x 11
# chr pos a0 a1 rsid af_UKBB ld pos_hg17 pos_hg18 pos_hg38 liftover_diff
# <int> <int> <chr> <chr> <chr> <dbl> <dbl> <int> <int> <int> <int>
# 1 7 142275743 C T rs949425 0.102 16.7 141531051 141724336 339802 141935941
# 2 7 142274854 G A rs2854536 0.297 16.0 141531940 141725225 340691 141934163
# 3 7 142274092 A G rs2854540 0.768 17.9 141532702 141725987 341453 141932639
# 4 7 142274025 T C rs2854541 0.739 18.0 141532769 141726054 341520 141932505
# 5 7 142273430 C T rs361487 0.132 6.33 141533702 141726987 342115 141931315
# 6 7 142273128 T C rs361488 0.736 18.1 141534004 141727289 342417 141930711
# 7 7 142272619 A C rs2855882 0.227 15.5 141534513 141727798 342926 141929693
# 8 7 142261439 G A rs2734060 0.169 20.8 141545697 141738982 354112 141907327
# 9 7 142261012 A G rs2734062 0.831 20.8 141546124 141739409 354539 141906473
#10 7 142260736 G A rs2734063 0.904 19.0 141546400 141739685 354815 141905921
## … with 1,054,320 more rows
When checking the first rsID listed here, we can see that the hg38 position in dbsnp is more in line with the positions in all the other builds: https://www.ncbi.nlm.nih.gov/snp/rs949425
Thanks for this investigation. If I understand correctly, the problem arises because the genetic positions are not monotonic with the hg38 positions.
Then the question is: do we really want to use the hg38 positions at all? Another solution would be to simply do linear interpolation. But there is a bit of an issue with the edges.
Hi again! Yes you are correct, the underlying reason must be non-monotony of the genetic positions and the hg38 positions.
I have looked at this issue a bit more and decided to implement the suggested fix (adding ties = "ordered"
to the spline()
call) in my own pipeline, however it turns out that the suggested fix does not actually fix the problem when I tested it on my real data (I'm not sure why that is, to be honest, as this is after hm3 filtering).
As for alternatives to the spline interpolation - what do you think about using a k-NN solution for the variants that need it? Find the closest variant (in hg38 position space) that was successfully matched via rsID and use the genetic distance of that closest neighbor. Unless I'm missing something, it seems like estimating the genetic distances without rsID works exactly like this already (albeit directly on the hg19 positions in the genetic map files).
Also, maybe worth mentioning that the number of variants any fix would apply to is rather small; the majority of variants can be succesfully matched through their rsID (chr 6 has the most non-matching variants):
library(tidyverse)
library(bigsnpr)
# Reproducible version, can point to local copies instead
test_variants <- readRDS(url("https://ndownloader.figshare.com/files/25503788"))
gen_map_dir <- "./genetic_map_testing"
all_gen_pos <- tibble(V1 = as.character(),
V2 = as.numeric(),
V3 = as.numeric())
# Loading all genetic map files, takes 2-3 minutes with just one core
for (chr in 1:22) {
all_gen_pos <- bind_rows(all_gen_pos,
bigreadr::fread2(paste0(gen_map_dir, "/chr", chr, ".OMNI.interpolated_genetic_map")))
}
(check_rsid_matches <- test_variants %>% left_join(all_gen_pos, by = c("rsid" = "V1")) %>%
rename(gen_dist = V3) %>%
select(-V2))
# # A tibble: 1,054,330 x 11
# chr pos a0 a1 rsid af_UKBB ld pos_hg17 pos_hg18 pos_hg38 gen_dist
# <int> <int> <chr> <chr> <chr> <dbl> <dbl> <int> <int> <int> <dbl>
# 1 1 752721 A G rs3131972 0.841 3.69 792584 742584 817341 0.0157
# 2 1 754182 A G rs3131969 0.870 3.73 794045 744045 818802 0.0167
# 3 1 760912 C T rs1048488 0.840 3.69 800775 750775 825532 0.0212
# 4 1 768448 G A rs12562034 0.106 1.40 808311 758311 833068 0.0263
# 5 1 779322 A G rs4040617 0.128 3.68 819185 769185 843942 0.0336
# 6 1 838555 C A rs4970383 0.245 3.57 878418 828418 903175 0.160
# 7 1 846808 C T rs4475691 0.198 5.20 886671 836671 911428 0.188
# 8 1 853954 C A rs1806509 0.608 4.45 893817 843817 918574 0.194
# 9 1 854250 A G rs7537756 0.210 5.70 894113 844113 918870 0.195
# 10 1 864938 G A rs2340587 0.762 5.10 904801 854801 929558 NA
# # … with 1,054,320 more rows
check_rsid_matches %>%
group_by(chr) %>%
summarise(matched = sum(!is.na(gen_dist)),
no_match = sum(is.na(gen_dist))) %>%
ungroup() %>%
mutate(match_ratio = matched / (matched + no_match)) %>%
print(n = 22)
# # A tibble: 22 x 4
# chr matched no_match match_ratio
# <int> <int> <int> <dbl>
# 1 1 86924 221 0.997
# 2 2 87281 240 0.997
# 3 3 73257 157 0.998
# 4 4 65215 275 0.996
# 5 5 66174 156 0.998
# 6 6 66194 4852 0.932
# 7 7 57888 148 0.997
# 8 8 56962 122 0.998
# 9 9 48355 120 0.998
# 10 10 56412 137 0.998
# 11 11 53825 140 0.997
# 12 12 51375 117 0.998
# 13 13 40019 111 0.997
# 14 14 35023 70 0.998
# 15 15 31678 90 0.997
# 16 16 32321 110 0.997
# 17 17 28442 329 0.989
# 18 18 31410 87 0.997
# 19 19 19748 55 0.997
# 20 20 27688 40 0.999
# 21 21 15066 47 0.997
# 22 22 15411 38 0.998
The thing with kNN is that it is okay because the data is so dense in the genetic map, which might not be the case when considering only the variants matched by rsid. Give me a few days to think about it (I'll get back to you by Friday max).
In the meantime, could you try your code with linear interpolation instead? (function approx()
I think).
Thanks.
Yes that's true, I was about to look at the location of the variants that cannot be matched by rsID. If they are all in the same region (suspect that might be the case for chr 6) then kNN is even more problematic.
Will report some findings!
What about using method = "fmm"
? It does not need monotonic positions.
My apologies, I've been preoccupied with other tasks. I can test spline(method = "fmm")
this afternoon if you like, but obviously it should fix the non-monotony problem.
So the only thing I would actually test then is whether or not the interpolated results look reasonable.
Yes, please tell me if results with this look reasonable.
Hi again! I have now been able to test linear interpolation as well as the "fmm" spline interpolation. Both of them fix the issue with non-monotony that is triggered by using the "hyman" spline interpolation.
Quick code snippet for running the "fmm" interpolation adn inspecting the results:
library(tidyverse)
library(bigsnpr)
test_variants <- readRDS(url("https://ndownloader.figshare.com/files/25503788"))
gen_map_dir <- "../ldpred2_resources/genetic_maps/" # change to point to your local copies
# Rename variables to match names inside snp_asGeneticPos() function
pos <- test_variants$pos_hg38
rsid <- test_variants$rsid
fmm_res <- tibble(chr = as.integer(),
pos = as.integer(),
rsid = as.character(),
na_rsid = as.logical(),
dist_rsid = as.numeric())
for (chr in 1:22) {
cat("Testing 'fmm' interpolation - Chromosome", chr, "\n")
try(expr =
{
ind.chr <- which(test_variants$chr == chr)
map.chr <- bigreadr::fread2(paste0(gen_map_dir, "/chr", chr, ".OMNI.interpolated_genetic_map"), showProgress = FALSE)
# by rsid, lines 296-305
ind <- match(rsid[ind.chr], map.chr$V1)
new_pos <- map.chr$V3[ind]
indNA <- which(is.na(ind))
if (length(indNA) > 0) {
pos.chr <- pos[ind.chr]
new_pos[indNA] <- suppressWarnings(
stats::spline(pos.chr, new_pos, xout = pos.chr[indNA], method = "fmm")$y)
}
# Log results and missing rsids
fmm_res <- bind_rows(fmm_res, tibble(chr = chr,
pos = pos[ind.chr],
rsid = rsid[ind.chr],
na_rsid = is.na(ind),
dist_rsid = new_pos))
}
)
}
fmm_res %>%
ggplot(aes(x = pos, y = dist_rsid)) +
geom_segment(data = filter(fmm_res, na_rsid), aes(xend = pos, yend = 0), alpha = 0.1, colour = "red") +
geom_line() +
facet_wrap(facets = vars(chr), scales = "free", nrow = 5) +
labs(x = "hg38 position", y = "genetic distance", title = "Estimating genetic distances via rsIDs & 'fmm' interpolation", subtitle = "red areas represent interpolated variants")
I don't know how critically you'd like to inspect the estimated distances, but from a high level, I would say these are fine - although two data points stand out. I'm showing the "fmm" results because they present these two outlying data points: 1) the very first variant on chromosome 9 ends up with an estimated genetic distance of -50; 2) the very last variant on chromosome 19 has a weird hg38 position (way too high) and ends up being estimated lower than the other rightmost variants on chr 19 (it's this SNP: https://www.ncbi.nlm.nih.gov/projects/SNP/snp_ref.cgi?do_not_redirect&rs=rs34501900).
For the pipeline we are using, I think I'm likely to either a) use the linear interpolation, as approx()
allows handling cases that fall outside the interpolation interval with the rule
argument (return either NAs or the values at interval edge) or b) hack a solution where I just use the corresponding hg19 positions for the hg38 variants I want to convert to genetic distances (they are right there in the hm3 variant list, after all...)
Thanks for reporting.
I'll try when I have some time this week.
Dear developer and users,
I am having the same issue:
POS2 <- snp_asGeneticPos(CHR, POS, rsid = RSID, dir=".") Error in { : task 4 failed - "'y' must be increasing or decreasing"
How should I modify the code?
Many thanks!!
Paloma
I'll try to look at this again next week.
So, I think there are two okay solutions:
- Use a linear interpolation:
approx(pos.chr, new_pos, xout = pos.chr[indNA], rule = 2)$y
- Do not try to interpolate positions corresponding to rsids that have not been matched; these variants may not be easy to map and therefore we may not have a high confidence in them -> maybe better just to remove them (getting NAs for the genetic pos).
Honestly I would play the safe card by picking option 2. @timo-cpr What's your opinion on this?