ENH: create a dataset of pre-registered motors.
Pull request type
- [x] Code changes (bugfix, features)
- [x] Code maintenance (refactoring, formatting, tests)
- [x] ReadMe, Docs and GitHub updates
- [ ] Other (please describe):
Checklist
- [x] Tests for the changes have been added (if needed)
- [x] Docs have been reviewed and added / updated
- [x] Lint (
black rocketpy/ tests/) has passed locally - [ ] All tests (
pytest tests -m slow --runslow) have passed locally - [x]
CHANGELOG.mdhas been updated (if relevant)
Current behavior
The user had to add its own .eng file to use the method GenericMotor.load_from_eng_file() to create a GenericMotor from it, and RocketPy didn't have a dataset of pre-registered motors.
New behavior
RocketPy now has its own dataset of pre-registered motors, containing 63 .eng files, located in rocketpy/datasets/motors/.
Three functions are now present in utilities.py to help:
list_motors_dataset(): returns a list of all available motors names;load_motor_from_dataset(): load a GenericMotor from a motor name from the dataset;show_motors_dataset(): prints the list of available motors, useful for quick inspection in terminals and Jupyter notebooks.
Breaking change
- [ ] Yes
- [x] No
Additional information
The user guide has documentation for the three new functions:
- The following test didn't pass locally:
FAILED tests/integration/test_environment.py::test_rap_atmosphere - ValueError: The chosen launch time '2025-05-05-02: UTC' is not available in the provided file. Please choose a time within the range of the file, which starts at '2025-05-06-01 UTC'.
To be discussed
- Should this dataset be installed with the default installation of RocketPy using
pip? For now, the size of the dataset isn't big, so the easiest path is to just add it in the repo. However, in the future, it may become big enough to make it optional. Preparing the repo right now could be a good idea -- but it would demand changes in the PyPI pipeline.
Should this dataset be installed with the default installation of RocketPy using pip? For now, the size of the dataset isn't big, so the easiest path is to just add it in the repo. However, in the future, it may become big enough to make it optional. Preparing the repo right now could be a good idea -- but it would demand changes in the PyPI pipeline.
Well, I believe it would be nice if the installation was optional. I think it is worth to search how to do that. It would be nice to ensure a lighter installation by default.
One my side, there are two main concerns: 1 - You had to duplicate the .eng files in the repo. Isn't there any way wehre you don't need the new rocketpy/dataset folder, and instead simply download the data/ folder? 2 - Is there any particular reason for creating the new functions as utilities instead of methods of the GenericMotor class? (I'm not saying one option is better than other, I'm just trying to understand)
@Gui-FernandesBR, ok, I'll search for a nice way to make it optional and discuss it in the weekly. I know that it will involve another member of the team that has access to the PyPI configuration.
Regarding your concerns:
1 - There are ways, but the team and I discussed in the weekly that creating the datasets directory would be a good choice because:
- It separates what is meant to be used programmatically, as a resource (the
datasets/directory), from what is meant to be used in examples or sandbox exploration (thedata/directory), organizing it in a structured way (in this case, only having .eng files for motors, and nothing else); - It allows RocketPy to reference the dataset in other parts of the code using
rocketpy.datasets. I believe it wouldn't be good to reference thedatafolder like that, since it's not inside therocketpydirectory; - It is inspired in other well-stablished projects, like the color palettes in seaborn, or the datasets from scikit-learn (
sklearn.datasets).
2 - I thought it would fit better in the utilities because there may be future functions that also use the datasets directory, but are not necessarily related to the GenericMotor class. However, the fact that I've created the documentation inside the GenericMotor part in the user guide looks a little bit like an inconsistency haha, even though I thought it would be better for it to be there when thinking about its use case.
Codecov Report
:x: Patch coverage is 79.31034% with 6 lines in your changes missing coverage. Please review.
:white_check_mark: Project coverage is 80.13%. Comparing base (4df0b38) to head (25314de).
:warning: Report is 60 commits behind head on develop.
| Files with missing lines | Patch % | Lines |
|---|---|---|
| rocketpy/utilities.py | 79.31% | 6 Missing :warning: |
Additional details and impacted files
@@ Coverage Diff @@
## develop #814 +/- ##
===========================================
+ Coverage 79.11% 80.13% +1.02%
===========================================
Files 96 97 +1
Lines 11575 12049 +474
===========================================
+ Hits 9158 9656 +498
+ Misses 2417 2393 -24
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
:rocket: New features to boost your workflow:
- :snowflake: Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
@leogrosa please try to make the dataset optional on the installation. We don't have anything of the sort in the repo right now so you will need to research a bit on how to do it
@Gui-FernandesBR @MateusStano
So, I did some research and found that it's not possible to make part of this repository optional when installing via pip.
The only way to achieve that would be by hosting the files elsewhere. This could be done in many ways, but I believe the most scalable and maintainable option would be to create a separate repository under the RocketPy-Team organization. We could then include it as an optional dependency in RocketPy, using pip’s extras_require mechanism. That way, we avoid implementing our own logic for download handling, file hosting, retries, etc — all of which pip already handles very well.
We could create a repository named rocketpy-datasets, but that would lead to the same problem we're trying to avoid: increasing size over time. For instance, if a user wants to install only the motors dataset, they'd still have to download the entire dataset repository, including unrelated data (e.g., weather, atmospheric profiles...).
So, my proposal is to create a dedicated repository for each dataset. For example:
rocketpy-motor-datasetrocketpy-weather-dataset- ...
This granularity would allow us to keep each dataset truly optional, and help prevent unnecessary package size growth in the long term.
The only downside is having to maintain multiple small repositories. However, since these are intended to store only static data files, setup should be straightforward. I can even prepare a private “dataset repository template” to simplify this process for future datasets.
What do you guys think?
@Gui-FernandesBR @MateusStano
So, I did some research and found that it's not possible to make part of this repository optional when installing via pip.
The only way to achieve that would be by hosting the files elsewhere. This could be done in many ways, but I believe the most scalable and maintainable option would be to create a separate repository under the RocketPy-Team organization. We could then include it as an optional dependency in RocketPy, using pip’s
extras_requiremechanism. That way, we avoid implementing our own logic for download handling, file hosting, retries, etc — all of which pip already handles very well.We could create a repository named
rocketpy-datasets, but that would lead to the same problem we're trying to avoid: increasing size over time. For instance, if a user wants to install only the motors dataset, they'd still have to download the entire dataset repository, including unrelated data (e.g., weather, atmospheric profiles...).So, my proposal is to create a dedicated repository for each dataset. For example:
rocketpy-motor-datasetrocketpy-weather-dataset- ...
This granularity would allow us to keep each dataset truly optional, and help prevent unnecessary package size growth in the long term.
The only downside is having to maintain multiple small repositories. However, since these are intended to store only static data files, setup should be straightforward. I can even prepare a private “dataset repository template” to simplify this process for future datasets.
What do you guys think?
Thanks for the update, Leo. I need to think a bit more about the problem before answering you.
We will create another repo to serve as a "dataset" for motors.