quantmod
quantmod copied to clipboard
OHLC returns wrong/duplicate columns
Description
The issue occurs when multiple columns in the data set match the OHLC-names. The OHLC function returns each matching column. In the test case the data set contains a column from Stochastic oscillator (TTR function) with the name "slowD". As a result OHLC returns; "AAPL.Open", "AAPL.High", "AAPL.Low", "slowD", "AAPL.Close" (from a data set with 20 columns).
Expected behavior
Expect the OHLC function to return the Open
, High
, Low
and Close
data, nothing else.
Minimal, reproducible example
aapl <- quantmod::getSymbols("AAPL", from = "2019-01-01", auto.assign = FALSE)
stoch <- TTR::stoch(aapl[, c("AAPL.Close", "AAPL.Low", "AAPL.Close")])
x <- merge(aapl, stoch)
colnames(quantmod::OHLC(x))
In the source the "Low" column is found by grep('Low',colnames(x),ignore.case=TRUE)
.
Possible solution(s) can be to first attempt more strict matches, alternative to always return the first result grep('Low',colnames(x),ignore.case=TRUE)[1]
.
This is similar to #24. Have you thought through the discussion there?
Hi, sorry I only checked against the open issues. Your suggestion to implement with test coverage makes sense. Would it be okay to use testthat package (current test are not using it)? If you are open to implement the improvement, I can prepare some tests.
No worries! I only wanted you to be aware of the history, so you could take it into account while thinking about the issue.
It would be great if you could write some tests, especially if you can think of any other edge cases that aren't mentioned in the other issue.
You can write them in the testthat style and/or use testthat locally. But I would use tinytest instead, so don't spend time getting the package to build/check with testthat.
In preparing test cases I do not get a positive result from has.Op
using an object returned by getSymbols
with column name changed to "Op". Given first three lines I would expect a TRUE
(see below). In general attr(x, "some_attribute")
returns NULL
while attributes
show the specific attribute. Btw is has.Op
inconsistent with Op
in this perspective? What is the expected behaviour?
`has.Op` <-
function(x,which=FALSE)
{
colAttr <- attr(x, "Op")
if(!is.null(colAttr))
return(if(which) colAttr else TRUE)
loc <- grep('Open',colnames(x),ignore.case=TRUE)
if(!identical(loc,integer(0))) {
return(if(which) loc else TRUE)
} else FALSE
}
Based upon the test cases analysed from #24 and #305, the following solution should work.
`has.Op` <-
function(x,which=FALSE)
{
cols <- colnames(x)
n <- which(cols %in% c("Op", "Open"))
if(identical(n,integer(0))) {
n <- grep('Open$',cols, ignore.case = TRUE)
}
return(if(which) n else !identical(n,integer(0)))
}
And below solution for the related function (avoiding code duplication).
`Op` <-
function(x)
{
col <- has.Op(x, which = TRUE)
if(!identical(col, integer(0))) {
return(x[, col])
} else {
stop('subscript out of bounds: open column not found')
}
}
If you think this makes sense I can prepare a pull request tomorrow with the changes for OHLC functions and including tinytest cases.
Created pull request #306 to improve the column matching (with test cases).
Created pull request #306 to improve the column matching (with test cases).
I'm sorry I didn't respond earlier. I started working on the comments below on Saturday morning (which was still later than I should have). I'm going to leave these comments here, and I'll add the relevant comments to the PR.
Please keep in mind that I had most of the points below in mind before you created your PR. They're based on the code in your comments in this issue, so some points might not apply to your PR.
If you think this makes sense I can prepare a pull request tomorrow with the changes for OHLC functions and including tinytest cases.
I would appreciate a PR! Please take a look at the contributing guide before you get started.
Some notes to keep in mind:
- Please do not remove the column attribute handling in the
has.*
functions. That functionality has been there since 2011. Removing it could break existing code. -
n <- which(cols %in% c("Op", "Open"))
will usually beinteger(0)
. I don't feel good about that code being on the happy path when it's a special case. Not for performance reasons, but because it makes the function intent less clear. - It's extremely helpful to have tests before starting to refactor code. Hopefully, the code already has tests, but that wasn't true in this case. So I would have written tests on the existing code first, to make sure I didn't break things when I was refactoring.
Thanks for your feedback. I will add the response to the PR #306.
I have run into this bug too....
Have you considered using a REGEX for your pattern matching in grep
? You could return "T" for any instance of "low" UNLESS it is followed by a period. For instance:
grepl(
"low($|[^.]).*",
c("LOW.High", "LOW.Low", "Low", "low", "MSFT.Low", "slow", "Lower."),
ignore.case = T
)
[1] FALSE TRUE TRUE TRUE TRUE TRUE TRUE
In the pattern low($|[^.]).*
, low
means match the letters l, o, w, in order.
($|[^.])
means then match the end of the string (the '$") or any character except a period (^ within [ ] means match anything other the characters following ^ within the [ ]). The . outside of [ ] means match any character at all and the * means zero of more times. So basically this tells the parser to "match 'low' followed by either the end of the string or as many other characters as you like, but just not a period.
I can think of one instance when this solution would fail: a ticker containing "LOW" followed by another letter. For instance, if there were a ticker "FLOWR", the bug would re-emerge.
The only other way I can see to solve this would be to match only columns that end in "low," which seems like a reasonable permanent solution. To do that, use "low$" as your grep pattern:
grepl(
"low($|[^.]).*",
c("LOW.High", "LOW.Low", "Low", "low", "MSFT.Low", "slow", "Lower."),
ignore.case = T
)
[1] FALSE TRUE TRUE TRUE TRUE TRUE FALSE
Perhaps implement the first solution now and warn users that in a future version, the column names will need to end with the key search words.
Thanks for your feedback. The proposed solution in the pull request is to catch exact matches first and secondly perform a regex with 'grep('Low$', colnames(x), ignore.case = TRUE)'. Similar to your second option. The first option fails for example on the symbol of this ETF.
The proposed solution works with how the columns are added in other quantmod functions, and by being more strict reduces the risk of interference with other packages/code. The downside is a risk to break code that people may have implemented based on the current matching.