IRanges
                                
                                 IRanges copied to clipboard
                                
                                    IRanges copied to clipboard
                            
                            
                            
                        Make the IRanges() constructor compatible with ALTREP
(Moving this from the devteam-bioc Slack to GitHub)
Jiefei wrote on Slack:
Hi Herve, I would like to make some changes to the IRanges package but I might miss something so I want to hear your opinion. For making the IRanges package compatible with ALTREP, the IRanges should be able to use the user's input as its start and width variable, so for the C level constructor:
SEXP solve_user_SEW0(SEXP start, SEXP end, SEXP width)
{
    SEXP ans, ans_start, ans_width;
    int ans_length, i;
    ans_length = LENGTH(start);
    PROTECT(ans_start = NEW_INTEGER(ans_length));
    PROTECT(ans_width = NEW_INTEGER(ans_length));
    for (i = 0; i < ans_length; i++) {
        if (solve_user_SEW0_row(INTEGER(start)[i],
                    INTEGER(end)[i],
                    INTEGER(width)[i],
                    INTEGER(ans_start) + i,
                    INTEGER(ans_width) + i) != 0)
        {
            UNPROTECT(2);
            error("solving row %d: %s", i + 1, errmsg_buf);
        }
    }
    PROTECT(ans = _new_IRanges("IRanges", ans_start, ans_width, R_NilValue));
    UNPROTECT(3);
    return ans;
}
I plan to change it to
SEXP solve_user_SEW0(SEXP start, SEXP end, SEXP width)
{
    SEXP ans;
    int duplicate_num = 0;
    int ans_length = LENGTH(start);
    for (int i = 0; i < ans_length; i++) {
        if (solve_user_SEW0_row(i, 
            &start,&end, &width,
            &duplicate_num
        ) != 0)
        {
            error("solving row %d: %s", i + 1, errmsg_buf);
        }
    }
    PROTECT(ans = _new_IRanges("IRanges", start, width, R_NilValue));
    UNPROTECT(1+ duplicate_num);
    return ans;
}
The function
solve_user_SEW0_rowwill duplicatestartandwidthif it need to change the value of these two variables. The similar pattern is also found in the functionsolve_user_SEW. I would like to know if there is any concern that prevents us from using the user's input as the IRanges object's slots. Do you think these changes could be a solution for the ALTREP problem?
Sure. Not duplicating the user input when it doesn't need to be modified sounds like a good general principle anyway, even without ALTREP in the picture. I will suggest a slightly different implementation though. I'm not a big fan of the duplicate_num thing. I found it cleaner and much easier to maintain in the long run to do the UNPROTECTs in the function body that did the PROTECTs. In other words, I've always tried to have each function clean-up after itself rather than delegating the cleaning to the caller. I have a meeting in 20s (at 3pm EST) but will propose something after that.
@Jiefei-Wang
Hi Jiefei,
This is done in IRanges 2.19.12: https://github.com/Bioconductor/IRanges/commit/6eb40e6a0d2dc9bccf8fdffc2b91f282bb023c94
Note the two-pass approach. The idea is to do a first pass in order to:
- Determine whether the supplied startandwidthcontain NAs. If they do, they can't be used as-is.
- Make sure that all the supplied ranges are valid. This is better done before allocating the new startand/orwidthvectors (better to fail quickly rather than after a possibly costly memory allocation).
With this version of IRanges:
library(IRanges)
library(XVector)  # for XVector:::address()
start <- 101:105
width <- 7:3
ir1 <- IRanges(start, width=width)
stopifnot(XVector:::address(start(ir1)) == XVector:::address(start))
stopifnot(XVector:::address(width(ir1)) == XVector:::address(width))
width <- c(7:5, NA, NA)
ir2 <- IRanges(start, width=width)
# Error in .Call2("solve_user_SEW0", start, end, width, PACKAGE = "IRanges") : 
#   In range 4: at least two out of 'start', 'end', and 'width', must
#   be supplied.
end <- c(NA, NA, NA, 999, 104)
ir3 <- IRanges(start, end, width)
stopifnot(XVector:::address(start(ir3)) == XVector:::address(start))
stopifnot(XVector:::address(width(ir3)) != XVector:::address(width))
start <- c(101:104, NA)
width <- 7:3
end <- c(NA, NA, NA, NA, 10)
ir4 <- IRanges(start, end, width)
stopifnot(XVector:::address(start(ir4)) != XVector:::address(start))
stopifnot(XVector:::address(width(ir4)) == XVector:::address(width))
Let me know how that works in the ALTREP context.
H.
Hi @hpages ,
From our discussion last week, we can make use of ALTREP if the code can operate on SEXPs instead of data pointers. Would you like me to apply these changes and make a pull request? I will preserve the code structure as possible as I can.
Best, Jiefei
Sounds good Jiefei. Thanks!