quantmod
quantmod copied to clipboard
getSymbols.yahoo wrong retrievals when from and to dates are supplied
Description
When specifying the from and to periods in getSymbols with source yahoo the expected results are often incorrect. I tested this on several markets (US, Europe, Argentina and Australia) and only the data from Europe is correct. Either you are missing the last requested date (from) or in the Australian case you even get an earlier date you didn't specify. Setting your system to the local timezone makes no difference.
I just want people to be aware of this issue as this is not something that can be solved by quantmod. The data sent to yahoo the yahoo servers is correct, but if you look in the temp file that captures the yahoo data that is sent back, you can see that incorrect results are returned by yahoo.
This might be related to issue #237.
It once more makes it clear that free data comes with it's own issues.
Expected behavior
I expected that outside of holidays the data would be returned when specifying the from and to dates.
Minimal, reproducible example
# Europe Brussels, US, Argentina, Australia
symbols <- c("FAGR.BR", "MSFT", "PAMP.BA", "SUN.AX")
stock_list <- lapply(symbols,
getSymbols,
src="yahoo",
from = "2018-11-07",
to = "2018-11-09",
auto.assign = F)
[[1]]
FAGR.BR.Open FAGR.BR.High FAGR.BR.Low FAGR.BR.Close FAGR.BR.Volume FAGR.BR.Adjusted
2018-11-07 14.90 15.34 14.89 15.33 109657 15.33
2018-11-08 15.35 15.44 14.93 14.95 74946 14.95
2018-11-09 14.87 14.89 14.63 14.67 64319 14.67
[[2]]
MSFT.Open MSFT.High MSFT.Low MSFT.Close MSFT.Volume MSFT.Adjusted
2018-11-07 109.44 112.24 109.40 111.96 37901700 111.96
2018-11-08 111.80 112.21 110.91 111.75 25644100 111.75
[[3]]
PAMP.BA.Open PAMP.BA.High PAMP.BA.Low PAMP.BA.Close PAMP.BA.Volume PAMP.BA.Adjusted
2018-11-07 50.60 51.3 49.9 50.15 689643 50.15
2018-11-08 49.95 51.8 49.0 50.00 844107 50.00
[[4]]
SUN.AX.Open SUN.AX.High SUN.AX.Low SUN.AX.Close SUN.AX.Volume SUN.AX.Adjusted
2018-11-06 13.94 14.03 13.785 13.83 2273201 13.83
2018-11-07 13.88 14.06 13.880 13.99 1944805 13.99
2018-11-08 13.90 14.14 13.900 14.08 1911448 14.08
Session Info
R version 3.5.0 (2018-04-23)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)
Matrix products: default
locale:
[1] LC_COLLATE=Dutch_Netherlands.1252 LC_CTYPE=Dutch_Netherlands.1252 LC_MONETARY=Dutch_Netherlands.1252
[4] LC_NUMERIC=C LC_TIME=Dutch_Netherlands.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] quantmod_0.4-13 TTR_0.23-4 xts_0.11-2 zoo_1.8-4
loaded via a namespace (and not attached):
[1] compiler_3.5.0 tools_3.5.0 curl_3.2 yaml_2.2.0 grid_3.5.0 lattice_0.20-35
This seemed familiar to me, and I searched and found the analysis you did for #211. Do you think this is the same or related?
@joshuaulrich
You are partly right. I ran the same code, but now for dates in the summer. Europe and Australia are correct, but the Americas are still of. So Australia is indeed related to issue #211. But for the Americas it is a different issue. Most people will not notice it because they either use the default retrieval settings or just set the start date.
I haven't tested if this occurs with alphaVantage or Tiingo. I will give that a go later today.
stock_list_summmer <- lapply(symbols,
getSymbols,
src="yahoo",
from = "2018-08-15",
to = "2018-08-17",
auto.assign = F).
stock_list_summmer
[[1]]
FAGR.BR.Open FAGR.BR.High FAGR.BR.Low FAGR.BR.Close FAGR.BR.Volume FAGR.BR.Adjusted
2018-08-15 16.29 16.32 16.05 16.14 198152 16.14
2018-08-16 16.10 16.24 15.94 16.09 117223 16.09
2018-08-17 16.01 16.20 15.92 16.03 93161 16.03
[[2]]
MSFT.Open MSFT.High MSFT.Low MSFT.Close MSFT.Volume MSFT.Adjusted
2018-08-15 108.49 108.99 106.82 107.66 29982800 107.1969
2018-08-16 108.30 108.86 107.30 107.64 21384300 107.1770
[[3]]
PAMP.BA.Open PAMP.BA.High PAMP.BA.Low PAMP.BA.Close PAMP.BA.Volume PAMP.BA.Adjusted
2018-08-15 38.85 39.2 38.00 38.25 1125010 38.25
2018-08-16 38.75 39.4 37.55 38.30 1176490 38.30
[[4]]
SUN.AX.Open SUN.AX.High SUN.AX.Low SUN.AX.Close SUN.AX.Volume SUN.AX.Adjusted
2018-08-15 15.19 15.19 15.00 15.03 9117934 15.03
2018-08-16 15.10 15.38 15.01 15.34 3638013 15.34
2018-08-17 15.29 15.29 15.12 15.22 2387092 15.22
Additional tests:
-
AlphaVantage doesn't have a from / to option, so that test just returns all the data for the last 100 days but at least including today.
-
Tiingo returns the data from MSFT correctly. But Tiingo doesn't have the data for the Argentinian stock nor for the European and Australian ones.
Yahoo Finance is adding subscriptions in January: https://www.axios.com/verizon-restructures-media-division-launches-subscriptions-1541966320-7cc14b71-89b4-4ea2-86be-a36b378e8cce.html
They expect people to pay for cr** data? Wow
There is a line in quantmod::getSymbols.yahoo to set the end-date of the interval:
to.posix <- .dateToUNIX(to)
.dateToUNIX uses as.Date() which 'ignores' the entire time of the day. Here are the tests:
> quantmod:::.dateToUNIX('2018-11-09')
[1] 1541721600 # this end-time is currently used
> as.POSIXct(1541721600, origin = "1970-01-01", tz='UTC')
[1] "2018-11-09 UTC" # which is midnight UTC, zero seconds
> as.POSIXct(1541721601, origin = "1970-01-01", tz='UTC') # let check one second later
[1] "2018-11-09 00:00:01 UTC" # midnight + 1sec, day barely started
> as.numeric(as.POSIXct('2018-11-09 23:59:59', origin = "1970-01-01", tz='UTC'))
[1] 1541807999 # with this real end-of-day UTC time, results are correct, see below
handle <- quantmod:::.getHandle(list(), force.new = TRUE)
symbols <- c("FAGR.BR", "MSFT", "PAMP.BA", "SUN.AX")
from.posix <- quantmod:::.dateToUNIX('2018-11-07')
to.posix <- as.numeric(as.POSIXct('2018-11-09 23:59:59', origin = "1970-01-01", tz='UTC'))
stock_list <- setNames(lapply(symbols, function(sym) {
yahoo.URL <- paste0('https://query1.finance.yahoo.com/v7/finance/download/',sym,
'?period1=',from.posix,
'&period2=',to.posix,'&interval=1d&events=history&crumb=',handle$cb)
read.csv(curl::curl(yahoo.URL, handle = handle$ch), na.strings = "null")
}), symbols)
stock_list
$FAGR.BR
Date Open High Low Close Adj.Close Volume
1 2018-11-07 14.90 15.34 14.89 15.33 15.16382 109657
2 2018-11-08 15.35 15.44 14.93 14.95 14.78794 74946
3 2018-11-09 14.87 14.89 14.63 14.67 14.51097 64319
$MSFT
Date Open High Low Close Adj.Close Volume
1 2018-11-07 109.44 112.24 109.40 111.96 109.2506 37901700
2 2018-11-08 111.80 112.21 110.91 111.75 109.0457 25644100
3 2018-11-09 110.85 111.45 108.76 109.57 106.9184 32039200
$PAMP.BA
Date Open High Low Close Adj.Close Volume
1 2018-11-07 50.60 51.3 49.90 50.15 50.15 689643
2 2018-11-08 49.95 51.8 49.00 50.00 50.00 844107
3 2018-11-09 50.00 50.0 48.25 49.35 49.35 519031
$SUN.AX
Date Open High Low Close Adj.Close Volume
1 2018-11-07 14.3563 14.4490 14.1967 14.2430 12.7825 2207278
2 2018-11-08 14.2945 14.4799 14.2945 14.4078 12.9304 1888405
3 2018-11-09 14.3151 14.5623 14.3151 14.5005 13.0136 1856016
Double checking with the other sample interval 2018-08-15 to 2018-08-17 gives correct results as well. Since Yahoo deals with stock exchanges all over the world and requires timestamps (not dates), probably it tracks time at this scale - seconds.
Therefore to.posix in quantmod::getSymbols.yahoo may have a problem.
EDIT: this is bigger than I thought... Suppose to.posix is fixed, for instance like so
to.posix <- as.numeric(as.POSIXct(paste0(as.character(to),' 23:59:59'), origin = "1970-01-01", tz='UTC'))
When getSymbols is called without a 'to' date, it will return (surprise, surprise) an additional row with today's date OHLCV values. Until now we were thankful to get data up to yesterday, e.g.
print(getSymbols('IBM', from='2020-06-01', auto.assign=FALSE))
and eventually call getQuote for today's values. Thus the fix could break user code currently in place.
Revisiting this and I found that Yahoo itself provides unexpected times when you download the data from their site. Here's the URL I get when I go to the historical data page for FAGR.BR (Brussels exchange) and click the "download" link:
https://query1.finance.yahoo.com/v7/finance/download/FAGR.BR?period1=1658975527&period2=1690511527&interval=1d&events=history&includeAdjustedClose=true
Note the values for period1
and period2
. They correspond to 2022-07-27 21:32:07 CDT
and 2023-07-27 21:32:07 CDT
in my timezone, respectively. But it was only 14:32:07 CDT
when I visited the site.
My timezone is a negative offset from UTC, so they're even further in the future if I convert the numbers to POSIXct in UTC: .POSIXct(1690511527, tz = "UTC")
is 2023-07-28 02:32:07 UTC
(2:30 AM tomorrow).
I get this URL for MSFT: https://query1.finance.yahoo.com/v7/finance/download/MSFT?period1=1658955159&period2=1690491159&interval=1d&events=history&includeAdjustedClose=true
Those period values are very close to my local time 2023-07-27 15:52:39 CDT
. They're one hour ahead, which suggests the period values are based on the exchange time (MSFT exchange time is US/Eastern).
I need to test more, but it seems like this is the cause of the issue.
We would need to check for an exchange suffix to determine the exchange timezone based solely on the ticker. Instead of doing that, I'm going to add/subtract a day from the to
/from
dates and then subset the resulting xts object to the user-supplied dates.
I added/subtracted a day from the to/from dates and then subset the result. Then I discovered that Yahoo returns data for today while the market is open. I'm not sure if this is good or bad...
## MSFT.Open MSFT.High MSFT.Low MSFT.Close MSFT.Volume MSFT.Adjusted
## 2007-01-03 29.91 30.250 29.40 29.86 76935100 21.52598
## 2007-01-04 29.70 29.970 29.44 29.81 45774500 21.48994
## 2007-01-05 29.63 29.750 29.45 29.64 44607200 21.36739
## 2007-01-08 29.65 30.100 29.53 29.93 50220200 21.57644
## 2007-01-09 30.00 30.180 29.73 29.96 44636600 21.59807
## 2007-01-10 29.80 29.890 29.43 29.66 55017400 21.38180
## 2007-01-11 29.76 30.750 29.65 30.70 99464300 22.13154
## 2007-01-12 30.65 31.390 30.64 31.21 103972500 22.49919
## 2007-01-16 31.26 31.450 31.03 31.16 62379600 22.46314
## 2007-01-17 31.26 31.440 31.01 31.10 58519600 22.41990
## ...
## 2023-07-17 345.68 346.990 342.20 345.73 20363900 345.73001
## 2023-07-18 345.83 366.780 342.17 359.49 64872700 359.48999
## 2023-07-19 361.75 362.460 352.44 355.08 39732900 355.07999
## 2023-07-20 353.57 357.970 345.37 346.87 33778400 346.87000
## 2023-07-21 349.15 350.300 339.83 343.77 69368900 343.76999
## 2023-07-24 345.85 346.920 342.31 345.11 26678100 345.10999
## 2023-07-25 347.11 351.890 345.07 350.98 41637700 350.98001
## 2023-07-26 341.44 344.670 333.11 337.77 58383700 337.76999
## 2023-07-27 340.48 341.330 329.05 330.72 39563000 330.72000
## 2023-07-28 333.67 340.005 333.17 338.75 13756324 338.75000 <-- today
Actually, I don't think data for today is returned during market hours. I just ran the following code at ~ 12:20pm ET
Today data is only returned if to = Sys.Date() + 1
> getSymbols("MSFT", auto.assign = FALSE, from = '2023-07-25')
MSFT.Open MSFT.High MSFT.Low MSFT.Close MSFT.Volume MSFT.Adjusted
2023-07-25 347.11 351.89 345.07 350.98 41637700 350.98
2023-07-26 341.44 344.67 333.11 337.77 58383700 337.77
2023-07-27 340.48 341.33 329.05 330.72 39635300 330.72
2023-07-28 333.67 340.01 333.17 338.37 28463000 338.37
2023-07-31 336.92 337.70 333.36 335.92 25421600 335.92
> getSymbols("MSFT", auto.assign = FALSE, from = '2023-07-25', to = Sys.Date())
MSFT.Open MSFT.High MSFT.Low MSFT.Close MSFT.Volume MSFT.Adjusted
2023-07-25 347.11 351.89 345.07 350.98 41637700 350.98
2023-07-26 341.44 344.67 333.11 337.77 58383700 337.77
2023-07-27 340.48 341.33 329.05 330.72 39635300 330.72
2023-07-28 333.67 340.01 333.17 338.37 28463000 338.37
2023-07-31 336.92 337.70 333.36 335.92 25421600 335.92
> getSymbols("MSFT", auto.assign = FALSE, from = '2023-07-25', to = Sys.Date() + 1)
MSFT.Open MSFT.High MSFT.Low MSFT.Close MSFT.Volume MSFT.Adjusted
2023-07-25 347.11 351.89 345.07 350.98 41637700 350.98
2023-07-26 341.44 344.67 333.11 337.77 58383700 337.77
2023-07-27 340.48 341.33 329.05 330.72 39635300 330.72
2023-07-28 333.67 340.01 333.17 338.37 28463000 338.37
2023-07-31 336.92 337.70 333.36 335.92 25421600 335.92
2023-08-01 335.19 338.32 333.71 337.74 9212780 337.74
@billelev your observation is consistent with mine. Sorry for the confusion.
The reason it doesn't return data for today when you use Sys.Date()
is because Dates are midnight UTC. Midnight UTC on 2023-08-01 is +4 hours from America/New_York (the exchange timezone). So the request to Yahoo stops at 2023-07-31 20:00:00 America/New_York (UTC -04:00), which is outside of today's market hours.
My proposed change was to effectively use Sys.Date() + 1
so the request always included the data from the from
date.
EDIT: here's another oddity in current behavior. You get today's data if your timezone is Asia/Tokyo (because it's 2023-08-02 02:12:00 in Tokyo right now).
Sys.setenv(TZ = "Asia/Tokyo")
tail(quantmod::getSymbols("MSFT", auto.assign = FALSE))
## MSFT.Open MSFT.High MSFT.Low MSFT.Close MSFT.Volume MSFT.Adjusted
## 2023-07-25 347.11 351.89 345.07 350.98 41637700 350.98
## 2023-07-26 341.44 344.67 333.11 337.77 58383700 337.77
## 2023-07-27 340.48 341.33 329.05 330.72 39635300 330.72
## 2023-07-28 333.67 340.01 333.17 338.37 28463000 338.37
## 2023-07-31 336.92 337.70 333.36 335.92 25421600 335.92
## 2023-08-01 335.19 338.54 333.71 338.03 10426627 338.03
@joshuaulrich that makes sense.
I did some more testing this morning. The web page has the same date behavior. i.e. if you request historical data for today's date, it will display the date range with a today - 1 adjustment. I do think the most robust approach is to change the from/to dates by -1/+1 and then do a post-request filter.