[ENH] new data loaders, replace `yfinance` downloads
Get rid of yfinance. Create on csv file serving all notebooks. Closes #677
@fkiraly this is good to be merged. Now using one flat file common for all notebooks. I have added the script to generate this file.
Plus, could you please, please write descriptive summaries for your PR? Use AI if you need to.
I don't understand. Do you want to include the notebooks in the package released?
import os
if not os.path.isdir('data'):
os.system('git clone https://github.com/pyportfolio/pyportfolioopt.git')
os.chdir('PyPortfolioOpt/cookbook')
??? why would one have this in a notebook?
I don't understand. Do you want to include the notebooks in the package released?
no, I want loader functions for the csv in the package release, and the notebooks then import the loader from the package, rather than loading the csv directly.
from pypfopt.data import load_something
my_dummydata = load_something()
All the csv manipulation is hidden underneath, ensuring that the notebook is short and does not distract from what is being shown.
I don't understand. Do you want to include the notebooks in the package released?
no, I want loader functions for the csv in the package release, and the notebooks then import the loader from the package, rather than loading the
csvdirectly.from pypfopt.data import load_something my_dummydata = load_something()All the csv manipulation is hidden underneath, ensuring that the notebook is short and does not distract from what is being shown.
Could be done, but I would still not put the csv file into the package. I would do something like
load_prices("my_file.csv", ticker=["A","B","C"], start=1990-01-01)
Could be done, but I would still not put the csv file into the package.
How large are they? If the files are too large, we could cut them to be smaller? I think 1MB of example data is ok for a package. Imo it is a nice user experience to have some testing data shipped with a package for, testing or learning how to use it.
Could be done, but I would still not put the csv file into the package.
How large are they? If the files are too large, we could cut them to be smaller? I think 1MB of example data is ok for a package. Imo it is a nice user experience to have some testing data shipped with a package for, testing or learning how to use it.
More like 6.6 MB at the moment. We could change the examples and use the same tickers across... and maybe less history...
But this functionality would only be used by people "developing" the package. For users I don't see the point of having data in there. We would also be on thin ice legally as you can not "distribute" financial data :-) For that purpose I should also remove the last year from the data...
For users I don't see the point of having data in there
The point is running the examples, so users can play around with the python code and the data sets.
We would also be on thin ice legally as you can not "distribute" financial data :-) For that purpose I should also remove the last year from the data...
You are absolutely right! Thanks for pointing this out!!
How about we replace the downloader by some very simple simulated data then, to avoid any legal liabilities that go with using yfinance or data from it?
For users I don't see the point of having data in there
The point is running the examples, so users can play around with the python code and the data sets.
We would also be on thin ice legally as you can not "distribute" financial data :-) For that purpose I should also remove the last year from the data...
You are absolutely right! Thanks for pointing this out!!
How about we replace the downloader by some very simple simulated data then, to avoid any legal liabilities that go with using
yfinanceor data from it?
Ich denke, dass wir nicht päpstlicher als der Papst sein müssen. Ich würde einfach die Daten aus diesem Jahr löschen. Auch im yfinance package gibt es in den Test resourcen genug authentic files. Wir könnten auch die Ticker hashen, aber das wäre doch alles mühsam. Es wird sich niemand beschweren und falls doch, koennen wir gewiss reagieren.
(kindly asking to keep to English so other contributors can also read)
what is the action that you are suggesting?
Using a frozen extract but with the most recent data removed? Not sure if this is fine with the license terms - the liability would be with @robertmartin8 or GC.OS, so I rather prefer to be on the very safe side.
Could we just use simulated data, or data where we know the license is ok?
can you please merge this. The tests of the notebooks you copied into main.yml are somewhat unstable. You need some of the moderate fixes