pandas-datareader icon indicating copy to clipboard operation
pandas-datareader copied to clipboard

Parse FamaFrenchReader 'DESCR'

Open sellingsrud opened this issue 4 years ago • 0 comments

This is a feature request.

Currently

from pandas_datareader.famafrench import FamaFrenchReader

description = FamaFrenchReader("F-F_Research_Data_Factors").read()["DESCR"]

Returns data as a string with various information, rather than something more easily accessible in Python such as a dictionary.

Suggestion: Parse "DESCR" to return something akin to

{
    "title" : "F-F Research Data Factors",
    "info" : "This file bla bla",
    "names" : {"0" : "(59 rows x 4 cols)", "1" : "Annual Factors: January-December (5 rows x 4 cols)"}
}

Toy example that works as intended on "F-F_Research_Data_Factors" but assumes a very distinct structure on 'descr':

from itertools import groupby

def parse_descr(descr):
    """
    Parses the currently returned 'DESCR' string into a dictionary.

    Params
    ------
    descr : str

    Returns
    -------
    dict
    """
    # Split the string into list on new line and groupby empty rows
    descr = [i.strip() for i in descr.split("\n")]
    data = [list(s) for e, s in groupby(descr, key=bool) if e]

    keys = ["title", "info", "names"]

    description = {}
    for d, k in zip(data, keys):
        if k == "names":
            description[k] = {i.split(":")[0]: i.split(":")[1] for i in d}
        else:
            description[k] = "".join(d).strip("-")

    return description

sellingsrud avatar Feb 13 '21 12:02 sellingsrud