DBnomics.jl
DBnomics.jl copied to clipboard
Access DBnomics data series from Julia.
DBnomics.jl
DBnomics Julia client
This package provides you access to DBnomics data series. DBnomics is an open-source project with the goal of aggregating the world's economic data in one location, free of charge to the public. DBnomics covers hundreds of millions of series from international and national institutions (Eurostat, World Bank, IMF, ...).
To use this package, you have to provide the codes of the provider, dataset and series you want. You can retrieve them directly on the website.
To install DBnomics.jl
, go to the package manager with ]
:
add DBnomics
or install the github version with :
add https://github.com/s915/DBnomics.jl
All the functions, and their names, are derived from the R package rdbnomics which I also maintain.
Examples
Fetch time series by ids
:
# Fetch one series from dataset 'Unemployment rate' (ZUTN) of AMECO provider :
df1 = rdb(ids = "AMECO/ZUTN/EA19.1.0.0.0.ZUTN");
# Fetch two series from dataset 'Unemployment rate' (ZUTN) of AMECO provider :
df2 = rdb(ids = ["AMECO/ZUTN/EA19.1.0.0.0.ZUTN", "AMECO/ZUTN/DNK.1.0.0.0.ZUTN"]);
# Fetch two series from different datasets of different providers :
df3 = rdb(ids = ["AMECO/ZUTN/EA19.1.0.0.0.ZUTN", "IMF/BOP/A.FR.BCA_BP6_EUR"]);
In the event that you only use the argument ids
, you can drop it and run :
df1 = rdb("AMECO/ZUTN/EA19.1.0.0.0.ZUTN");
Fetch time series by mask
:
# Fetch one series from dataset 'Balance of Payments' (BOP) of IMF :
df1 = rdb("IMF", "BOP", mask = "A.FR.BCA_BP6_EUR");
# Fetch two series from dataset 'Balance of Payments' (BOP) of IMF :
df2 = rdb("IMF", "BOP", mask = "A.FR+ES.BCA_BP6_EUR");
# Fetch all series along one dimension from dataset 'Balance of Payments' (BOP) of IMF :
df3 = rdb("IMF", "BOP", mask = "A..BCA_BP6_EUR");
# Fetch series along multiple dimensions from dataset 'Balance of Payments' (BOP) of IMF :
df4 = rdb("IMF", "BOP", mask = "A.FR.BCA_BP6_EUR+IA_BP6_EUR");
In the event that you only use the arguments provider_code
, dataset_code
and mask
, you can drop the name mask
and run :
df1 = rdb("IMF", "BOP", "A.FR.BCA_BP6_EUR");
Fetch time series by dimensions
:
# Fetch one value of one dimension from dataset 'Unemployment rate' (ZUTN) of AMECO provider :
df1 = rdb("AMECO", "ZUTN", dimensions = Dict(:geo => "ea12"));
# or
df1 = rdb("AMECO", "ZUTN", dimensions = (geo = "ea12",));
# Fetch two values of one dimension from dataset 'Unemployment rate' (ZUTN) of AMECO provider :
df2 = rdb("AMECO", "ZUTN", dimensions = Dict(:geo => ["ea12", "dnk"]));
# or
df2 = rdb("AMECO", "ZUTN", dimensions = (geo = ["ea12", "dnk"],));
# Fetch several values of several dimensions from dataset 'Doing business' (DBS) of World Bank :
df3 = rdb("WB", "DBS", dimensions = Dict(:country => ["DZA", "PER"], :frequency => ["A"], :indicator => ["ENF.CONT.COEN.COST.ZS", "IC.REG.COST.PC.FE.ZS"]));
# or
df3 = rdb("WB", "DBS", dimensions = (country = ["DZA", "PER"], frequency = ["A"], indicator = ["ENF.CONT.COEN.COST.ZS", "IC.REG.COST.PC.FE.ZS"]));
Fetch time series with a query
:
# Fetch one series from dataset 'WEO:2019-10 by countries' (WEO:2019-10) of IMF provider:
df1 = rdb("IMF", "WEO:2019-10", query = "France current account balance percent");
# Fetch series from dataset 'WEO:2019-10 by countries' (WEO:2019-10) of IMF provider:
df2 = rdb("IMF", "WEO:2019-10", query = "current account balance percent");
Fetch one series from the dataset 'Doing Business' of WB provider with the link:
df1 = rdb(api_link = "https://api.db.nomics.world/v22/series/WB/DBS?dimensions=%7B%22country%22%3A%5B%22FRA%22%2C%22ITA%22%2C%22ESP%22%5D%7D&q=IC.REG.PROC.FE.NO&observations=1&format=json&align_periods=1&offset=0&facets=0");
In the event that you only use the argument api_link
, you can drop the name and run:
df1 = rdb("https://api.db.nomics.world/v22/series/WB/DBS?dimensions=%7B%22country%22%3A%5B%22FRA%22%2C%22ITA%22%2C%22ESP%22%5D%7D&q=IC.REG.PROC.FE.NO&observations=1&format=json&align_periods=1&offset=0&facets=0");
Fetch the available datasets of a provider
# Example with the IMF datasets:
df_datasets = rdb_datasets("IMF");
# Example with the IMF and BDF datasets:
df_datasets = rdb_datasets(["IMF", "BDF"]);
In the event that you only request the datasets for one provider, if you define
simplify = true
, then the result will be a DataFrame
not a Dict
.
df_datasets = rdb_datasets("IMF", simplify = true);
Fetch the possible dimensions of available datasets of a provider
# Example for the dataset WEO:2019-10 of the IMF:
df_dimensions = rdb_dimensions("IMF", "WEO:2019-10");
In the event that you only request the dimensions for one dataset for one
provider, if you define simplify = true
, then the result will be a Dict
of
DataFrame
s not a nested Dict
.
df_dimensions = rdb_dimensions("IMF", "WEO:2019-10", simplify = true);
Fetch the number of series of available datasets of a provider
# Example for the dataset WEOAGG:2019-10 of the IMF:
df_series = rdb_series("IMF", "WEOAGG:2019-10");
# With dimensions
df_series = rdb_series("IMF", "WEO:2019-10", dimensions = Dict(Symbol("weo-country") => "AGO"));
df_series = rdb_series("IMF", "WEO:2019-10", dimensions = Dict(Symbol("weo-subject") => "NGDP_RPCH"), simplify = true);
# With a query
df_series = rdb_series("IMF", "WEO:2019-10", query = "ARE");
df_series = rdb_series("IMF", ["WEO:2019-10", "WEOAGG:2019-10"], query = "NGDP_RPCH");
:warning: We ask the user to use this function parsimoniously because there are a huge amount of series per dataset. Please only fetch for one dataset if you need it or visit the website https://db.nomics.world.
Proxy configuration
When using the functions rdb
or rdb_...
, if you come across an error concerning your internet connection, you can get round this situation by :
-
configuring curl of the function
HTTP.get
orHTTP.post
to use a specific and authorized proxy. -
using the functions
readlines
anddownload
if you have problem withHTTP.get
.
Configure curl to use a specific and authorized proxy
In DBnomics.jl, by default the function HTTP.get
or HTTP.post
are used to fetch the data. If a specific proxy must be used, it is possible to define it permanently with the package global variable curl_config
or on the fly through the argument curl_config
. In that way the object is passed to the keyword arguments of the function HTTP.get
or HTTP.post
.
To see the available parameters, visit the website https://curl.haxx.se/libcurl/c/curl_easy_setopt.html.
Once they are chosen, you define the curl object as follows :
h = Dict(:proxy => "http://<proxy>:<port>");
Regarding the functioning of HTTP.jl, you might need to modify another option to change the db/editor.nomics.world url from https:// to http:// (see https://github.com/JuliaWeb/HTTP.jl/pull/390) :
DBnomics.options("secure", false);
Set the connection up for a session
The curl connection can be set up for a session by modifying the following package option :
DBnomics.options("curl_config", h);
After configuration, just use the standard functions of DBnomics.jl e.g. :
df1 = rdb(ids = "AMECO/ZUTN/EA19.1.0.0.0.ZUTN");
This option of the package can be disabled with :
DBnomics.options("curl_config", nothing);
Use the connection only for a function call
If a complete configuration is not needed but just an "on the fly" execution, then use the argument curl_config
of the functions rdb
and rdb_...
:
df1 = rdb(ids = "AMECO/ZUTN/EA19.1.0.0.0.ZUTN", curl_config = h);
Use the standard functions readlines
and download
To retrieve the data DBnomics.jl can also use the standard functions readlines
and download
.
Set the connection up for a session
To activate this feature for a session, you need to enable an option of the package :
DBnomics.options("use_readlines", true);
And then use the standard function as follows :
df1 = rdb(ids = "AMECO/ZUTN/EA19.1.0.0.0.ZUTN");
This configuration can be disabled with :
DBnomics.options("use_readlines", false);
Use the connection only for a function call
If you just want to do it once, you may use the argument use_readlines
of the functions rdb
and rdb_...
:
df1 = rdb(ids = "AMECO/ZUTN/EA19.1.0.0.0.ZUTN", use_readlines = true);
Transform time series with filters
The DBnomics.jl package can interact with the Time Series Editor of DBnomics to transform time series by applying filters to them.
Available filters are listed on the filters page https://editor.nomics.world/filters.
Here is an example of how to proceed to interpolate two annual time series with a monthly frequency, using a spline interpolation:
filters = Dict(:code => "interpolate", :parameters => Dict(:frequency => "monthly", :method => "spline"));
df = rdb(ids = ["AMECO/ZUTN/EA19.1.0.0.0.ZUTN", "AMECO/ZUTN/DNK.1.0.0.0.ZUTN"], filters = filters);
If you want to apply more than one filter, the filters
argument will be a Tuple of valid filters:
filter1 = Dict(:code => "interpolate", :parameters => Dict(:frequency => "monthly", :method => "spline"));
filter2 = Dict(:code => "aggregate", :parameters => Dict(:frequency => "bi-annual", :method => "end_of_period"));
filters = (filter1, filter2);
df = rdb(ids = ["AMECO/ZUTN/EA19.1.0.0.0.ZUTN", "AMECO/ZUTN/DNK.1.0.0.0.ZUTN"], filters = filters);
The DataFrame
columns change a little bit when filters are used. There are two new columns:
-
period_middle_day
: the middle day oforiginal_period
(can be useful when you compare graphically interpolated series and original ones). -
filtered
(boolean):true
if the series is filtered,false
otherwise.
The content of two columns are modified:
-
series_code
: same as before for original series, but the suffix_filtered
is added for filtered series. -
series_name
: same as before for original series, but the suffix(filtered)
is added for filtered series.
Transform the DataFrame
object into a TimeArray
object
For some analysis, it is more convenient to have a TimeArray
object instead of a DataFrame
object. To transform
it, you can use the following functions :
using DBnomics
using DataFrames
using TimeSeries
function to_namedtuples(x::DataFrames.DataFrame)
nm = names(x)
vl = try
[x[!, col] for col in names(x)]
catch
[x[:, col] for col in names(x)]
end
nm = tuple(Symbol.(nm)...)
vl = tuple(vl...)
NamedTuple{nm}(vl)
end
function to_timeseries(
x::DataFrames.DataFrame,
index = :period, variable = :series_code, value = :value
)
x = unstack(x, index, variable, value)
x = to_namedtuples(x)
x = TimeArray(x, timestamp = index)
x
end
rdb("IMF", "BOP", mask = "A.FR+ES.BCA_BP6_EUR")
#> 162×18 DataFrame. Omitted printing of 12 columns
#> │ Row │ @frequency │ FREQ │ Frequency │ INDICATOR │ Indicator │ REF_AREA │
#> │ │ String │ String │ String │ String │ String │ String │
#> ├─────┼────────────┼────────┼───────────┼─────────────┼────────────────────────────────────┼──────────┤
#> │ 1 │ annual │ A │ Annual │ BCA_BP6_EUR │ Current Account, Total, Net, Euros │ ES │
#> │ 2 │ annual │ A │ Annual │ BCA_BP6_EUR │ Current Account, Total, Net, Euros │ ES │
#> │ ... │ ... │ ... │ ... │ ... │ ... │ ... │
#> │ 161 │ annual │ A │ Annual │ BCA_BP6_EUR │ Current Account, Total, Net, Euros │ FR │
#> │ 162 │ annual │ A │ Annual │ BCA_BP6_EUR │ Current Account, Total, Net, Euros │ FR │
to_timeseries(rdb("IMF", "BOP", mask = "A.FR+ES.BCA_BP6_EUR"))
#> 81×2 TimeArray{Union{Missing, Float64},2,Date,Array{Union{Missing, Float64},2}} 1940-01-01 to 2020-01-01
#> │ │ A.ES.BCA_BP6_EUR │ A.FR.BCA_BP6_EUR │
#> ├────────────┼──────────────────┼──────────────────┤
#> │ 1940-01-01 │ missing │ missing │
#> │ 1941-01-01 │ missing │ missing │
#> │ 1942-01-01 │ missing │ missing │
#> │ ... │ ... │ ... │
#> │ 2019-01-01 │ 24899.0 │ -16239.4 │
#> │ 2020-01-01 │ missing │ missing │
P.S.
Visit https://db.nomics.world/ !