pvlib-python icon indicating copy to clipboard operation
pvlib-python copied to clipboard

Add function to retrieve example dataset paths

Open echedey-ls opened this issue 2 years ago • 9 comments

Done:

  • [x] Closes #924
  • [x] I am familiar with the contributing guidelines
  • [x] Tests added
  • [x] New code is fully documented. Includes numpydoc compliant docstrings, examples, and comments where necessary.
  • [x] Pull request is nearly complete and ready for detailed review.
  • [x] Updates entries in docs/sphinx/source/reference for API changes. ~About this one, there isn't a tools.rst file, so I believe it doesn't need referencing.~

I'll wait for milestone assignment:

  • [x] Adds description and name entries in the appropriate "what's new" file in docs/sphinx/source/whatsnew for all changes. Includes link to the GitHub Issue with :issue:`num` or this Pull Request with :pull:`num`. Includes contributor name and/or GitHub username (link with :ghuser:`user`).
  • [x] Maintainer: Appropriate GitHub Labels (including remote-data) and Milestone are assigned to the Pull Request and linked Issue.

Adds a get_example_dataset_path() function to pvlib.tools that returns a filepath to a dataset bundled with pvlib. See linked issue above.

Here is the documentation link [UPDATED 2023-06-18].

echedey-ls avatar Jun 07 '23 10:06 echedey-ls

This issue is related to #1056. However, I haven't addressed any of it since in first place I don't know which purpose each file has. Tweaking this new function would leverage work put into changing the structure.

echedey-ls avatar Jun 07 '23 11:06 echedey-ls

Here is the documentation link [UPDATED].

echedey-ls avatar Jun 09 '23 07:06 echedey-ls

This should be ok now. Please have a look at the docs. I've created a table to leverage work on the other issue. Maybe it's better to position it on another paragraph.

Edit: oh, and I ended up adding the parameter, since it won't really impact the use, someone more experienced may be able to organize the data folder.

echedey-ls avatar Jun 10 '23 11:06 echedey-ls

Good to know about importlib.resources.files! It does require python 3.9 though.

The function name get_test_dataset_path implies that it's only for tests, yet it's used for all 3 of those items.

I suggested including test in the function name in an attempt to prevent confusing it for something like an iotools.get_ function; the thought was to try to make it clear that it's limited to only a select set of special files, not a general purpose data tool. But if we're looking to draw a line between test (as in pvlib.tests) and other types of data then I can see that it could be confusing.

But I don't see why one function for three types of data files might cause problems later on. If all the function does is figure_out_where_pvlib_is() + path_parameter, what's it matter whether the file is LinkeTurbidities.h5 or a testing file?

kandersolar avatar Jun 16 '23 19:06 kandersolar

There's an importlib-resources backport for older python. We already use this approach for importlib-metadata for python < 3.8. This realpython example might be helpful too.

I think your original list of pros to a function are still valid. There the focus is on a function for example code and maybe user onboarding that separates public resources from package structure. If the user-facing function only does that then it might be a local win. On the other hand, it can also be good to propagate standard python patterns in example code so it's less clear that it's a global win. I think the internal code should keep it simple and stick to importlib-resources.

wholmgren avatar Jun 16 '23 20:06 wholmgren

I understand the issue. We can limit the scope of this function until #1056 is addressed. I agree with @wholmgren criteria on that this PR should only satisfy user needs. I will revert unrelated changes.

I suggested including test in the function name in an attempt to prevent confusing it for something like an iotools.get_ function

@kandersolar what is your opinion on renaming it to get_example_dataset_path (or get_pvlib_example_dataset_path, get_bundled_dataset_path, maybe consider other synonym of 'bundled')?

I'd like to bring attention to two points:

  • Importlib imports the whole module with importlib.resources.files('pvlib'). I don't know how much overhead that introduces compared to using __path__[0] or __file__, but I expect it to be a little higher. Not much, but just to consider the micro optimization.
  • Tests are being shipped with pvlib when it's installed. Is that on purpose?

echedey-ls avatar Jun 17 '23 10:06 echedey-ls

This should be fine to go. Renamed to get_example_dataset_path and use cases are only in the examples.

echedey-ls avatar Jun 18 '23 18:06 echedey-ls

Importlib imports the whole module with importlib.resources.files('pvlib'). I don't know how much overhead that introduces compared to using __path__[0] or __file__, but I expect it to be a little higher. Not much, but just to consider the micro optimization.

It seems to use the same sys.modules cache as import does, so as long as we are assuming pvlib is already imported, or will be imported later, then I don't think there is any significant penalty.

Tests are being shipped with pvlib when it's installed. Is that on purpose?

Yes, see #473. There is a non-pvlib discussion here on whether including tests in sdists and wheels is a good idea in general (tl;dr opinions vary).

kandersolar avatar Jun 21 '23 17:06 kandersolar

That's good to know @kandersolar !

echedey-ls avatar Jun 21 '23 18:06 echedey-ls

Closing this due to inactivity. If there's interest in the future, feel free to ping me.

echedey-ls avatar Jul 13 '24 11:07 echedey-ls