tidyquant icon indicating copy to clipboard operation
tidyquant copied to clipboard

feature: add Rblpapi package as a data source

Open apsteinmetz opened this issue 7 years ago • 37 comments

Rblpapi, written by Dirk Eddelbuettel, et.al., interfaces to the Bloomberg terminal. This is, of course, a not-free data source and so is not going to be used by as many people as Yahoo, Quandl, et.al. Still, Rblpapi gets about 1500 downloads a month. It might be worth wrapping it. It's a small package with only a few functions. To whet your appetite I prototyped wrapping one function, Rblpapi::bdh() which would be wrapped by tq_get(). Easy peasy!

tq_get_blpapi<-function(secs,get="bdh",...){
  if (!("bdh" %in% get)) {
    stop("only bdh is implemented")
  }
  Rblpapi::blpConnect() #must have a valid blp session running
  blp_bdh  <-Rblpapi::bdh(secs,...=...)
 #tidy what Bloomberg gives us.
  blp_bdh_tq<-bind_rows(blp_bdh,.id='symbol')%>%
    separate(symbol,c('symbol','sector')) %>%
    select(date,symbol,everything())%>%
    group_by(sector,symbol)
  return(blp_bdh_tq)
}

The bit above is the TL;DR part. Here is the vignette with a dataset conformed to look like what the native Rblpapi::bdh() function returns.

# ----------------------------------------------------------
# What a Rblpapi::bdh function might look like in tidyquant

#form of the bdh function
START_DATE=as.Date("2007-01-01")
secs = c('SPX Index','AGTHX Equity','FDEGX Equity','LD12TRUU Index')
#BDH_OPTIONS = c("periodicitySelection"="MONTHLY")
#BBG_bdh  <-Rblpapi::bdh(secs,
#                  fields=c('RETURN','PRICE'), # not valid fields 
#                  start.date = START_DATE,
#                  end.date=Sys.Date(),
#                  options=BDH_OPTIONS)


#create an object that looks like what bdh will return using only public sources
#get from FRED
tbills_3mo <- tq_get("DGS3MO", get = "economic.data")
#convert to monthly rate
risk_free_rate <- tbills_3mo %>% transmute(date,Rf=price/100)
risk_free_rate_monthly<-risk_free_rate %>% 
  tq_transmute(select = Rf, 
               mutate_fun = to.monthly,
               indexAt='lastof',
               col_rename="return") %>% 
  mutate(return=return/12)

funds  <- tq_get(c("^SP500TR",'AGTHX','FDEGX'), 
                 get = "stock.prices", 
                 from = "2007-01-01") %>% 
  group_by(symbol)
#convert to monthly returns
funds_monthly<-funds %>% 
  tq_transmute(select     = adjusted, 
               mutate_fun = periodReturn,
               indexAt = 'lastof',
               period     = "monthly", 
               col_rename = "return") #%>% 

funds_monthly


#simulate what Rblpapi::bdh() will return
blp_bdh<-list(
  
  `LD12TRUU Index`=risk_free_rate_monthly %>% 
    select(date,return) %>% 
    mutate(wealth=cumprod(1+return)) %>% 
    as.data.frame() ,

`SPX Index`=funds_monthly %>% 
  filter(symbol=="^SP500TR") %>% 
  ungroup() %>% 
  select(date,return) %>% 
  mutate(wealth=cumprod(1+return)) %>% 
  as.data.frame(),

`AGTHX Equity`=funds_monthly %>% 
  filter(symbol=='AGTHX') %>% 
  ungroup() %>% 
  select(date,return) %>% 
  mutate(wealth=cumprod(1+return)) %>% 
  as.data.frame(),

`FDEGX Equity`=funds_monthly %>% 
  filter(symbol=='FDEGX') %>% 
  ungroup() %>% 
  select(date,return) %>% 
  mutate(wealth=cumprod(1+return)) %>% 
  as.data.frame()
)

blp_bdh

#Covert what Rblpapi returns to something like what tq_get returns
blp_bdh_tq<-bind_rows(blp_bdh,.id='symbol')%>%
  separate(symbol,c('symbol','sector')) %>%
  select(date,symbol,everything())%>%
  group_by(sector,symbol)
blp_bdh_tq

#a prototype tq_get for Rblpapi. Embed tidy coversion routine from above
tq_get_blpapi<-function(secs,get="bdh",...){
  if (!("bdh" %in% get)) {
    stop("only bdh is implemented")
  }
  Rblpapi::blpConnect() #must have a valid blp session running
  blp_bdh  <-Rblpapi::bdh(secs,...=...)
  blp_bdh_tq<-bind_rows(blp_bdh,.id='symbol')%>%
    separate(symbol,c('symbol','sector')) %>%
    select(date,symbol,everything())%>%
    group_by(sector,symbol)
  return(blp_bdh_tq)
}

# Working BBG Data get. Won't work if no blp session is running. Skip to 'fake it' below.
# CAUTION: The order of the returned series may not be the same as order of the inputs
# CAUTION: There is no guarantee that the all securities will have values for the same dates.
opts = c("periodicitySelection"="MONTHLY")
fields=c('DAY_TO_DAY_TOT_RETURN_GROSS_DVDS',"LAST_PRICE")

funds_and_bm<-tq_get_blpapi(secs,
              fields=fields,
              options=opts,
              start.date=as.Date("2015-01-01"),
              end.date=Sys.Date())

#fake it if no blpapi
#not using actual BBG field names which I usually rename anyway
funds_and_bm<-blp_bdh_tq

#looks okay?
funds_and_bm %>% 
  summarize(Avg_Return=mean(return),n=n())
## A tibble: 4 x 4
## Groups:   sector [?]
##  sector   symbol    Avg_Return     n
#  <chr>    <chr>        <dbl> <int>
#1 Equity    AGTHX 7.340296e-03   126
#2 Equity    FDEGX 6.885127e-03   126
#3  Index LD12TRUU 1.058918   126
#4  Index      SPX 7.043102e-03   126

ggplot(funds_and_bm,aes(x=date,y=wealth,color=symbol))+geom_line()

Thanks for considering this. I am having a blast playing around with tidyquant and think it will help push the notion of the tidyverse as a meta language on top of R.

apsteinmetz avatar Jun 22 '17 22:06 apsteinmetz

Great feature request. We are actually looking into this and other financial data sources at the moment. I agree 100% and thanks for the sample code. We'll certainly be evaluating this the second half of the year. 👍

mdancho84 avatar Jun 23 '17 01:06 mdancho84

Let me know if I can help, alpha-testing or whatever.

apsteinmetz avatar Jun 23 '17 03:06 apsteinmetz

Will do. It may be a bit since we'd like to have some conversations with Bloomberg before jumping in.

mdancho84 avatar Jun 23 '17 03:06 mdancho84

Hey, we are getting ready to begin the Rblpapi integration. Wanted to see if you or others would be available for some alpha testing!

mdancho84 avatar Nov 09 '17 15:11 mdancho84

I am honored to help out any way I can. I will test code promptly and partner with a-player-to-be-named in my organization if I am indisposed.

apsteinmetz avatar Nov 13 '17 19:11 apsteinmetz

Thanks Art. Davis is going to test in the next day or so. I am flying blind while I coded, so I suggest we let Davis do his thing first. Then once he and I agree, let's get you or someone to alpha test. Davis has access to a terminal.

mdancho84 avatar Nov 13 '17 19:11 mdancho84

Ok, we have good news and bad news. Good news is we have it working. Bad news is the integration is slower than bdh() because of how we are handling via iteration versus calling all at once (we will correct this). You can test it out, but just note that we will be reworking to improve performance shortly. You can try something like this...

my_bloomberg_data <- c('SPX Index','AGTHX Equity') %>%
    tq_get(get         = "rblpapi",
           rblpapi_fun = "bdh",
           fields      = c('RETURN','PRICE'),
           options     = c("periodicitySelection" = "MONTHLY"),
           from        = "2016-01-01",
           to          = "2016-12-31")

You can also test out other functions like bds() and bdp() by changing the rblpapi_fun argument (default is "bdh").

Let me know what you like/dislike about the API integration (besides performance) and any questions you may have.

mdancho84 avatar Nov 14 '17 23:11 mdancho84

Let's get started. https://github.com/apsteinmetz/tq_testing.git

Not sure of the best way to share output. The html file in the repo shows it.

apsteinmetz avatar Nov 15 '17 15:11 apsteinmetz

@apsteinmetz, looks good. As to your comments about from/to, you should be able to pass either:

  • from/to as character vectors
  • start.date/end.date as proper Dates

Internally we translate from/to into start.date/end.date if it is there. This is to maintain consistency with the other get functions that allow a date range!

DavisVaughan avatar Nov 15 '17 15:11 DavisVaughan

I don't think from/to are getting translated:

Error in bdh(securities = "AGTHX Equity", fields = c("PX_LAST", "TOT_RETURN_INDEX_GROSS_DVDS": unused arguments (from = "2016-01-01", to = "2016-12-31")

apsteinmetz avatar Nov 15 '17 17:11 apsteinmetz

Let me take a look.

mdancho84 avatar Nov 15 '17 20:11 mdancho84

Try now. I removed the from / to after being converted to start.date and end.date. Hopefully that works.

mdancho84 avatar Nov 15 '17 20:11 mdancho84

from/to works now

apsteinmetz avatar Nov 15 '17 21:11 apsteinmetz

updated repo with more bdh variants. Seems pretty robust. Will look at bdp now and put tq_get in some of my production code.
Now in friendly github format! Look at the .md file.

apsteinmetz avatar Nov 15 '17 22:11 apsteinmetz

Looks pretty good from the MD doc.

How is the speed? Slow?

Right now it’s setup for bds and bdp also. Let me know what else should be added.

mdancho84 avatar Nov 16 '17 00:11 mdancho84

tq_get(bdh) is prettty slow. See repo. In my use cases I wouldn't notice the difference, though. I grab the historical data once and then play with it. Still have to get to other functions. Day job, and all....

apsteinmetz avatar Nov 16 '17 20:11 apsteinmetz

tq_get(bdp) tested. Looks good so far. see repo.

apsteinmetz avatar Nov 16 '17 22:11 apsteinmetz

Haha, totally understand that open source software may not be your only thing going on. ;)

I checked the bdp results and it looks like the operation is working.

What are your thoughts on the API - Is it intuitive and how you would like to be able to interact with Bloomberg / Rblpapi? I'm interested in making it as easy and straightforward to use as possible.

Should speed be improved? Will there be demand for users to download a lot of Bloomberg data? My gut feel is it should, and I have a few ideas to improve speed.

mdancho84 avatar Nov 17 '17 17:11 mdancho84

'bds()' has problems. The parameter names for 'bds()' differ slightly from 'bdp()' and 'bdh()'. 'securities' becomes 'security' and that blows up piping in the ticker symbols and/or the default x parameter which seems to invariantly map to securities. 'fields'<> becomes 'field'. in bds() See error messages in repo. ...bds.md file

apsteinmetz avatar Nov 17 '17 19:11 apsteinmetz

hat are your thoughts on the API - Is it intuitive and how you would like to be able to interact with Bloomberg / Rblpapi?

I think it's very easy to use! The syntax is the same as the native Rblpapi functions with just the two additional tq_get parameter. The key benefit of tq_get in all the APIs is returning a tidy formatted table and moving out of the xts/zoo domain into the lubridate/tibbletime (Ugh, that name!) world -which is where so many people are moving to.

On the speed issue, I don't know. Bloomberg limits how much data you can suck up daily and monthly (look at the PDF I put in the repo). They don't specifically tell you what it is, just that the hammer might come down. I don't think the BBG service is suited for 'big data' work. As such, speed won't be an issue for most users. I will ask around though.

apsteinmetz avatar Nov 17 '17 22:11 apsteinmetz

Well that's a bummer about "bds". I checked the bds test report and it certainly looks like tq_get is struggling with the output. I'll have to investigate a bit more this weekend. Can you upload to GitHub an rds file of the output of the native "bds" call to Rblpapi? I want to use it to debug. I'd also like to do the same thing for the native call to "bdp" and "bdh" with multiple securities so I can test different methods for speed.

Regarding the feedback on the function, it sounds like we may be ready to move forward pending the speed concern and the BDS investigation. I have to get an update to CRAN soon because Yahoo Finance just took away the "key.stats" API, which is a bummer and now my tests are failing requiring a CRAN update. I'd like to include Bloomberg, Alpha Vantage, and Google Finance integrations.

mdancho84 avatar Nov 18 '17 11:11 mdancho84

added rds files with output from native calls.

The bds issue seems simple enough. You just have to recognize the default first parameter securities should become security No?

It is totally fine to release a version with just bdp() and bdh() working. That will cover 90% of use cases, I suspect. It covers 100% of mine.

apsteinmetz avatar Nov 18 '17 20:11 apsteinmetz

I'm leaning towards releasing a first version soon. I just updated to change securities to security for "bds". I'd like to test a few things to improve speed, but not sure how long this will take. Let me know if "bds" is working now, and we'll call this issue closed if it checks out.

mdancho84 avatar Nov 19 '17 11:11 mdancho84

No bueno. Same errors. See repo.

apsteinmetz avatar Nov 20 '17 14:11 apsteinmetz

This still fails even after devtools::install_github("business-science/tidyquant")?

my_bloomberg_data <- c('GOOG US Equity') %>%
    tq_get(get         = "rblpapi",
           rblpapi_fun = "bds",
           field       = c("TOP_20_HOLDERS_PUBLIC_FILINGS")
           )

mdancho84 avatar Nov 20 '17 15:11 mdancho84

No. That works. Piping in works. So does assigning the ticker to 'x'. Assigning the ticker to 'security' still fails with the complaint that x is missing.

apsteinmetz avatar Nov 27 '17 14:11 apsteinmetz

Oh, yes. I override the "security" argument. Use "x" instead.

mdancho84 avatar Nov 28 '17 00:11 mdancho84

Is that the behavior you want? bds() takes the 'security' parameter. Not being able to name it seems a problem. From the docs: Usage: bds(security, field, options = NULL, overrides = NULL, verbose = FALSE, identity = NULL, con = defaultConnection())

apsteinmetz avatar Nov 28 '17 20:11 apsteinmetz

Yes, I'd prefer to have the first parameter always be tq_get(x = symbols) so all API calls have the same format. Otherwise it becomes difficult for Matt the Developer to manage each combination, and ultimately makes it more difficult for Matt the User to remember which arguments to use.

It boils down to making tq_get() to be as consistent as possible for all APIs. For example, the Yahoo API call is very similar to the Quandl API call which in turn is very similar to the Bloomberg API call. We just set whatever base function arguments (e.g. "security" or "securities") to "x" internally.

# Yahoo
tq_get(x = "AAPL", get = "stock.prices")

# Quandl
tq_get(x = "WIKI/AAPL", get = "quandl")

#Bloomberg
tq_get(x = 'AAPL US Equity', get = "rblpapi", "bds")

mdancho84 avatar Nov 29 '17 16:11 mdancho84

I think that's a valid design choice. The tradeoff becomes you invalidate the reference docs for the native function that you ask users refer to for the details of the function. Some parameters DO use the native function assignments and some (well, one) doesn't. It's inconsistent.

BTW, I am re-writing a notebook in tidyverse/quant vernacular that analyzes a whole mutual fund family and I'm saving a lot of lines while making the code much clearer. Fun! I'll share when I'm done.

apsteinmetz avatar Nov 30 '17 03:11 apsteinmetz