tidyquant
tidyquant copied to clipboard
feature: add Rblpapi package as a data source
Rblpapi, written by Dirk Eddelbuettel, et.al., interfaces to the Bloomberg terminal. This is, of course, a not-free data source and so is not going to be used by as many people as Yahoo, Quandl, et.al. Still, Rblpapi gets about 1500 downloads a month. It might be worth wrapping it. It's a small package with only a few functions. To whet your appetite I prototyped wrapping one function, Rblpapi::bdh()
which would be wrapped by tq_get()
. Easy peasy!
tq_get_blpapi<-function(secs,get="bdh",...){
if (!("bdh" %in% get)) {
stop("only bdh is implemented")
}
Rblpapi::blpConnect() #must have a valid blp session running
blp_bdh <-Rblpapi::bdh(secs,...=...)
#tidy what Bloomberg gives us.
blp_bdh_tq<-bind_rows(blp_bdh,.id='symbol')%>%
separate(symbol,c('symbol','sector')) %>%
select(date,symbol,everything())%>%
group_by(sector,symbol)
return(blp_bdh_tq)
}
The bit above is the TL;DR part. Here is the vignette with a dataset conformed to look like what the native Rblpapi::bdh()
function returns.
# ----------------------------------------------------------
# What a Rblpapi::bdh function might look like in tidyquant
#form of the bdh function
START_DATE=as.Date("2007-01-01")
secs = c('SPX Index','AGTHX Equity','FDEGX Equity','LD12TRUU Index')
#BDH_OPTIONS = c("periodicitySelection"="MONTHLY")
#BBG_bdh <-Rblpapi::bdh(secs,
# fields=c('RETURN','PRICE'), # not valid fields
# start.date = START_DATE,
# end.date=Sys.Date(),
# options=BDH_OPTIONS)
#create an object that looks like what bdh will return using only public sources
#get from FRED
tbills_3mo <- tq_get("DGS3MO", get = "economic.data")
#convert to monthly rate
risk_free_rate <- tbills_3mo %>% transmute(date,Rf=price/100)
risk_free_rate_monthly<-risk_free_rate %>%
tq_transmute(select = Rf,
mutate_fun = to.monthly,
indexAt='lastof',
col_rename="return") %>%
mutate(return=return/12)
funds <- tq_get(c("^SP500TR",'AGTHX','FDEGX'),
get = "stock.prices",
from = "2007-01-01") %>%
group_by(symbol)
#convert to monthly returns
funds_monthly<-funds %>%
tq_transmute(select = adjusted,
mutate_fun = periodReturn,
indexAt = 'lastof',
period = "monthly",
col_rename = "return") #%>%
funds_monthly
#simulate what Rblpapi::bdh() will return
blp_bdh<-list(
`LD12TRUU Index`=risk_free_rate_monthly %>%
select(date,return) %>%
mutate(wealth=cumprod(1+return)) %>%
as.data.frame() ,
`SPX Index`=funds_monthly %>%
filter(symbol=="^SP500TR") %>%
ungroup() %>%
select(date,return) %>%
mutate(wealth=cumprod(1+return)) %>%
as.data.frame(),
`AGTHX Equity`=funds_monthly %>%
filter(symbol=='AGTHX') %>%
ungroup() %>%
select(date,return) %>%
mutate(wealth=cumprod(1+return)) %>%
as.data.frame(),
`FDEGX Equity`=funds_monthly %>%
filter(symbol=='FDEGX') %>%
ungroup() %>%
select(date,return) %>%
mutate(wealth=cumprod(1+return)) %>%
as.data.frame()
)
blp_bdh
#Covert what Rblpapi returns to something like what tq_get returns
blp_bdh_tq<-bind_rows(blp_bdh,.id='symbol')%>%
separate(symbol,c('symbol','sector')) %>%
select(date,symbol,everything())%>%
group_by(sector,symbol)
blp_bdh_tq
#a prototype tq_get for Rblpapi. Embed tidy coversion routine from above
tq_get_blpapi<-function(secs,get="bdh",...){
if (!("bdh" %in% get)) {
stop("only bdh is implemented")
}
Rblpapi::blpConnect() #must have a valid blp session running
blp_bdh <-Rblpapi::bdh(secs,...=...)
blp_bdh_tq<-bind_rows(blp_bdh,.id='symbol')%>%
separate(symbol,c('symbol','sector')) %>%
select(date,symbol,everything())%>%
group_by(sector,symbol)
return(blp_bdh_tq)
}
# Working BBG Data get. Won't work if no blp session is running. Skip to 'fake it' below.
# CAUTION: The order of the returned series may not be the same as order of the inputs
# CAUTION: There is no guarantee that the all securities will have values for the same dates.
opts = c("periodicitySelection"="MONTHLY")
fields=c('DAY_TO_DAY_TOT_RETURN_GROSS_DVDS',"LAST_PRICE")
funds_and_bm<-tq_get_blpapi(secs,
fields=fields,
options=opts,
start.date=as.Date("2015-01-01"),
end.date=Sys.Date())
#fake it if no blpapi
#not using actual BBG field names which I usually rename anyway
funds_and_bm<-blp_bdh_tq
#looks okay?
funds_and_bm %>%
summarize(Avg_Return=mean(return),n=n())
## A tibble: 4 x 4
## Groups: sector [?]
## sector symbol Avg_Return n
# <chr> <chr> <dbl> <int>
#1 Equity AGTHX 7.340296e-03 126
#2 Equity FDEGX 6.885127e-03 126
#3 Index LD12TRUU 1.058918 126
#4 Index SPX 7.043102e-03 126
ggplot(funds_and_bm,aes(x=date,y=wealth,color=symbol))+geom_line()
Thanks for considering this. I am having a blast playing around with tidyquant and think it will help push the notion of the tidyverse as a meta language on top of R.
Great feature request. We are actually looking into this and other financial data sources at the moment. I agree 100% and thanks for the sample code. We'll certainly be evaluating this the second half of the year. 👍
Let me know if I can help, alpha-testing or whatever.
Will do. It may be a bit since we'd like to have some conversations with Bloomberg before jumping in.
Hey, we are getting ready to begin the Rblpapi integration. Wanted to see if you or others would be available for some alpha testing!
I am honored to help out any way I can. I will test code promptly and partner with a-player-to-be-named in my organization if I am indisposed.
Thanks Art. Davis is going to test in the next day or so. I am flying blind while I coded, so I suggest we let Davis do his thing first. Then once he and I agree, let's get you or someone to alpha test. Davis has access to a terminal.
Ok, we have good news and bad news. Good news is we have it working. Bad news is the integration is slower than bdh()
because of how we are handling via iteration versus calling all at once (we will correct this). You can test it out, but just note that we will be reworking to improve performance shortly. You can try something like this...
my_bloomberg_data <- c('SPX Index','AGTHX Equity') %>%
tq_get(get = "rblpapi",
rblpapi_fun = "bdh",
fields = c('RETURN','PRICE'),
options = c("periodicitySelection" = "MONTHLY"),
from = "2016-01-01",
to = "2016-12-31")
You can also test out other functions like bds()
and bdp()
by changing the rblpapi_fun
argument (default is "bdh").
Let me know what you like/dislike about the API integration (besides performance) and any questions you may have.
Let's get started. https://github.com/apsteinmetz/tq_testing.git
Not sure of the best way to share output. The html file in the repo shows it.
@apsteinmetz, looks good. As to your comments about from/to, you should be able to pass either:
-
from
/to
as character vectors -
start.date
/end.date
as proper Dates
Internally we translate from
/to
into start.date
/end.date
if it is there. This is to maintain consistency with the other get
functions that allow a date range!
I don't think from/to are getting translated:
Error in bdh(securities = "AGTHX Equity", fields = c("PX_LAST", "TOT_RETURN_INDEX_GROSS_DVDS": unused arguments (from = "2016-01-01", to = "2016-12-31")
Let me take a look.
Try now. I removed the from / to after being converted to start.date and end.date. Hopefully that works.
from/to works now
updated repo with more bdh
variants. Seems pretty robust. Will look at bdp now and put tq_get in some of my production code.
Now in friendly github format! Look at the .md file.
Looks pretty good from the MD doc.
How is the speed? Slow?
Right now it’s setup for bds and bdp also. Let me know what else should be added.
tq_get(bdh)
is prettty slow. See repo. In my use cases I wouldn't notice the difference, though. I grab the historical data once and then play with it. Still have to get to other functions. Day job, and all....
tq_get(bdp)
tested. Looks good so far. see repo.
Haha, totally understand that open source software may not be your only thing going on. ;)
I checked the bdp results and it looks like the operation is working.
What are your thoughts on the API - Is it intuitive and how you would like to be able to interact with Bloomberg / Rblpapi? I'm interested in making it as easy and straightforward to use as possible.
Should speed be improved? Will there be demand for users to download a lot of Bloomberg data? My gut feel is it should, and I have a few ideas to improve speed.
'bds()' has problems. The parameter names for 'bds()' differ slightly from 'bdp()' and 'bdh()'. 'securities' becomes 'security' and that blows up piping in the ticker symbols and/or the default x
parameter which seems to invariantly map to securities
. 'fields'<> becomes 'field'. in bds()
See error messages in repo. ...bds.md file
hat are your thoughts on the API - Is it intuitive and how you would like to be able to interact with Bloomberg / Rblpapi?
I think it's very easy to use! The syntax is the same as the native Rblpapi functions with just the two additional tq_get parameter. The key benefit of tq_get in all the APIs is returning a tidy formatted table and moving out of the xts/zoo domain into the lubridate/tibbletime (Ugh, that name!) world -which is where so many people are moving to.
On the speed issue, I don't know. Bloomberg limits how much data you can suck up daily and monthly (look at the PDF I put in the repo). They don't specifically tell you what it is, just that the hammer might come down. I don't think the BBG service is suited for 'big data' work. As such, speed won't be an issue for most users. I will ask around though.
Well that's a bummer about "bds". I checked the bds test report and it certainly looks like tq_get is struggling with the output. I'll have to investigate a bit more this weekend. Can you upload to GitHub an rds file of the output of the native "bds" call to Rblpapi? I want to use it to debug. I'd also like to do the same thing for the native call to "bdp" and "bdh" with multiple securities so I can test different methods for speed.
Regarding the feedback on the function, it sounds like we may be ready to move forward pending the speed concern and the BDS investigation. I have to get an update to CRAN soon because Yahoo Finance just took away the "key.stats" API, which is a bummer and now my tests are failing requiring a CRAN update. I'd like to include Bloomberg, Alpha Vantage, and Google Finance integrations.
added rds files with output from native calls.
The bds issue seems simple enough. You just have to recognize the default first parameter securities
should become security
No?
It is totally fine to release a version with just bdp() and bdh() working. That will cover 90% of use cases, I suspect. It covers 100% of mine.
I'm leaning towards releasing a first version soon. I just updated to change securities
to security
for "bds". I'd like to test a few things to improve speed, but not sure how long this will take. Let me know if "bds" is working now, and we'll call this issue closed if it checks out.
No bueno. Same errors. See repo.
This still fails even after devtools::install_github("business-science/tidyquant")
?
my_bloomberg_data <- c('GOOG US Equity') %>%
tq_get(get = "rblpapi",
rblpapi_fun = "bds",
field = c("TOP_20_HOLDERS_PUBLIC_FILINGS")
)
No. That works. Piping in works. So does assigning the ticker to 'x'. Assigning the ticker to 'security' still fails with the complaint that x
is missing.
Oh, yes. I override the "security" argument. Use "x" instead.
Is that the behavior you want? bds()
takes the 'security' parameter. Not being able to name it seems a problem.
From the docs:
Usage:
bds(security, field, options = NULL, overrides = NULL, verbose = FALSE,
identity = NULL, con = defaultConnection())
Yes, I'd prefer to have the first parameter always be tq_get(x = symbols)
so all API calls have the same format. Otherwise it becomes difficult for Matt the Developer to manage each combination, and ultimately makes it more difficult for Matt the User to remember which arguments to use.
It boils down to making tq_get()
to be as consistent as possible for all APIs. For example, the Yahoo API call is very similar to the Quandl API call which in turn is very similar to the Bloomberg API call. We just set whatever base function arguments (e.g. "security" or "securities") to "x" internally.
# Yahoo
tq_get(x = "AAPL", get = "stock.prices")
# Quandl
tq_get(x = "WIKI/AAPL", get = "quandl")
#Bloomberg
tq_get(x = 'AAPL US Equity', get = "rblpapi", "bds")
I think that's a valid design choice. The tradeoff becomes you invalidate the reference docs for the native function that you ask users refer to for the details of the function. Some parameters DO use the native function assignments and some (well, one) doesn't. It's inconsistent.
BTW, I am re-writing a notebook in tidyverse/quant vernacular that analyzes a whole mutual fund family and I'm saving a lot of lines while making the code much clearer. Fun! I'll share when I'm done.