PyPortfolioOpt [ENH] new data loaders, replace `yfinance` downloads

Get rid of yfinance. Create on csv file serving all notebooks. Closes #677

Nov 15 '25 07:11 tschm

@fkiraly this is good to be merged. Now using one flat file common for all notebooks. I have added the script to generate this file.

Nov 15 '25 09:11 tschm

Plus, could you please, please write descriptive summaries for your PR? Use AI if you need to.

Nov 15 '25 10:11 fkiraly

I don't understand. Do you want to include the notebooks in the package released?

Nov 15 '25 10:11 tschm

import os
if not os.path.isdir('data'):
    os.system('git clone https://github.com/pyportfolio/pyportfolioopt.git')
    os.chdir('PyPortfolioOpt/cookbook')

??? why would one have this in a notebook?

Nov 15 '25 10:11 tschm

I don't understand. Do you want to include the notebooks in the package released?

no, I want loader functions for the csv in the package release, and the notebooks then import the loader from the package, rather than loading the csv directly.

from pypfopt.data import load_something

my_dummydata = load_something()

All the csv manipulation is hidden underneath, ensuring that the notebook is short and does not distract from what is being shown.

Nov 15 '25 22:11 fkiraly

I don't understand. Do you want to include the notebooks in the package released?

no, I want loader functions for the csv in the package release, and the notebooks then import the loader from the package, rather than loading the csv directly.
from pypfopt.data import load_something

my_dummydata = load_something()
All the csv manipulation is hidden underneath, ensuring that the notebook is short and does not distract from what is being shown.

Could be done, but I would still not put the csv file into the package. I would do something like

load_prices("my_file.csv", ticker=["A","B","C"], start=1990-01-01)

Nov 16 '25 03:11 tschm

Could be done, but I would still not put the csv file into the package.

How large are they? If the files are too large, we could cut them to be smaller? I think 1MB of example data is ok for a package. Imo it is a nice user experience to have some testing data shipped with a package for, testing or learning how to use it.

Nov 16 '25 12:11 fkiraly

Could be done, but I would still not put the csv file into the package.

How large are they? If the files are too large, we could cut them to be smaller? I think 1MB of example data is ok for a package. Imo it is a nice user experience to have some testing data shipped with a package for, testing or learning how to use it.

More like 6.6 MB at the moment. We could change the examples and use the same tickers across... and maybe less history...

Nov 16 '25 13:11 tschm

But this functionality would only be used by people "developing" the package. For users I don't see the point of having data in there. We would also be on thin ice legally as you can not "distribute" financial data :-) For that purpose I should also remove the last year from the data...

Nov 16 '25 13:11 tschm

For users I don't see the point of having data in there

The point is running the examples, so users can play around with the python code and the data sets.

We would also be on thin ice legally as you can not "distribute" financial data :-) For that purpose I should also remove the last year from the data...

You are absolutely right! Thanks for pointing this out!!

How about we replace the downloader by some very simple simulated data then, to avoid any legal liabilities that go with using yfinance or data from it?

Nov 18 '25 21:11 fkiraly

For users I don't see the point of having data in there

The point is running the examples, so users can play around with the python code and the data sets.

We would also be on thin ice legally as you can not "distribute" financial data :-) For that purpose I should also remove the last year from the data...

You are absolutely right! Thanks for pointing this out!!

How about we replace the downloader by some very simple simulated data then, to avoid any legal liabilities that go with using yfinance or data from it?

Ich denke, dass wir nicht päpstlicher als der Papst sein müssen. Ich würde einfach die Daten aus diesem Jahr löschen. Auch im yfinance package gibt es in den Test resourcen genug authentic files. Wir könnten auch die Ticker hashen, aber das wäre doch alles mühsam. Es wird sich niemand beschweren und falls doch, koennen wir gewiss reagieren.

Nov 19 '25 05:11 tschm

(kindly asking to keep to English so other contributors can also read)

what is the action that you are suggesting?

Using a frozen extract but with the most recent data removed? Not sure if this is fine with the license terms - the liability would be with @robertmartin8 or GC.OS, so I rather prefer to be on the very safe side.

Could we just use simulated data, or data where we know the license is ok?

Nov 25 '25 09:11 fkiraly

can you please merge this. The tests of the notebooks you copied into main.yml are somewhat unstable. You need some of the moderate fixes

Nov 25 '25 15:11 tschm