pvlib-python icon indicating copy to clipboard operation
pvlib-python copied to clipboard

Organize pvlib.data

Open cwhanse opened this issue 5 years ago • 6 comments

pvlib.data currently contains 1) databases for module and inverter models; 2) Linke turbidity values; 3) data files for tests and examples and 4) variable_style_rules.csv. Accurately described as a "junk drawer." The majority of the data files are in category 3, supporting tests.

As a start, maybe create data.tests and perhaps subfolders within tests that mirror the subfolders in pvlib.tests And perhaps add a prefix or other text to file names to help identify where or how it is used, e.g., PVsyst_demo.csv becomes test_sdm_pvsyst_demo.csv.

cwhanse avatar Sep 09 '20 17:09 cwhanse

Mirroring the structure would be an improvement.

Another option to consider is moving the subpackage tests into the subpackage, along with a data subdirectory within that test directory. For example:

pvlib/data  # databases, linke turbidity, anything a user might need
pvlib/tests  # test_atmosphere.py, etc
pvlib/tests/data  # singleaxis_tracker_wslope.csv, etc
pvlib/iotools/tests  # test_tmy.py, etc
pvlib/iotools/tests/data  # pvgis_tmy_test.dat, etc
pvlib/ivtools/tests
pvlib/ivtools/tests/data

I proposed a similar structure when we created pvlib/tests/iotools. I was outvoted but I still think it's better!

wholmgren avatar Sep 09 '20 17:09 wholmgren

I'm in favor of pvlib/tests/data, etc. rather than pvlib/data/tests.

cwhanse avatar Sep 09 '20 18:09 cwhanse

68 files to organize... I propose splitting up the work to make it manageable. I'd say the first step is to categorize each file. Feel free to react to this message with a 👍 if you want to contribute to that. I'll split the work in ranges for each one to work on it by mentioning you in this message next week.

but for those brave enough, you can work on it now: https://docs.google.com/spreadsheets/d/12LeEFa9-wRqc3v7utfgcTk96KTmaWfhHSPkLx6K0QhY/edit?usp=sharing

There are five categories, four for what Cliff said in this issue plus one if it's unknown; multiple can be selected for each entry. My way to go would be to look up where this files are mentioned and select the appropriate labels. Could be automated, but I don't feel like overengineering today - and that wouldn't take into account files not mentioned anywhere (if any).

React 👎 if you are against doing it this way (and potentially have a better idea)

echedey-ls avatar Oct 20 '24 17:10 echedey-ls

@echedey-ls can you add "Lookup table" or something like that to the pull down options? For files like the CEC module parameters.

cwhanse avatar Oct 21 '24 20:10 cwhanse

This is a summary of the classification, available in the spreadsheet's third sheet.

image

echedey-ls avatar Oct 22 '24 00:10 echedey-ls

This is a summary of the classification, available in the spreadsheet's third sheet.

@echedey-ls This looks good to me.

Should we create a sub-folder for the files only used for testing, i.e., the files that can be excluded?

AdamRJensen avatar Oct 22 '24 02:10 AdamRJensen