RocketPy icon indicating copy to clipboard operation
RocketPy copied to clipboard

ENH: create a dataset of pre-registered motors.

Open leogrosa opened this issue 7 months ago • 6 comments

Pull request type

  • [x] Code changes (bugfix, features)
  • [x] Code maintenance (refactoring, formatting, tests)
  • [x] ReadMe, Docs and GitHub updates
  • [ ] Other (please describe):

Checklist

  • [x] Tests for the changes have been added (if needed)
  • [x] Docs have been reviewed and added / updated
  • [x] Lint (black rocketpy/ tests/) has passed locally
  • [ ] All tests (pytest tests -m slow --runslow) have passed locally
  • [x] CHANGELOG.md has been updated (if relevant)

Current behavior

The user had to add its own .eng file to use the method GenericMotor.load_from_eng_file() to create a GenericMotor from it, and RocketPy didn't have a dataset of pre-registered motors.

New behavior

RocketPy now has its own dataset of pre-registered motors, containing 63 .eng files, located in rocketpy/datasets/motors/. Three functions are now present in utilities.py to help:

  • list_motors_dataset(): returns a list of all available motors names;
  • load_motor_from_dataset(): load a GenericMotor from a motor name from the dataset;
  • show_motors_dataset(): prints the list of available motors, useful for quick inspection in terminals and Jupyter notebooks.

Breaking change

  • [ ] Yes
  • [x] No

Additional information

The user guide has documentation for the three new functions: Captura de Tela 2025-05-05 às 23 48 11 Captura de Tela 2025-05-05 às 23 49 17 Captura de Tela 2025-05-05 às 23 49 44

  • The following test didn't pass locally:
FAILED tests/integration/test_environment.py::test_rap_atmosphere - ValueError: The chosen launch time '2025-05-05-02: UTC' is not available in the provided file. Please choose a time within the range of the file, which starts at '2025-05-06-01 UTC'.

To be discussed

  • Should this dataset be installed with the default installation of RocketPy using pip? For now, the size of the dataset isn't big, so the easiest path is to just add it in the repo. However, in the future, it may become big enough to make it optional. Preparing the repo right now could be a good idea -- but it would demand changes in the PyPI pipeline.

leogrosa avatar May 06 '25 02:05 leogrosa

Should this dataset be installed with the default installation of RocketPy using pip? For now, the size of the dataset isn't big, so the easiest path is to just add it in the repo. However, in the future, it may become big enough to make it optional. Preparing the repo right now could be a good idea -- but it would demand changes in the PyPI pipeline.

Well, I believe it would be nice if the installation was optional. I think it is worth to search how to do that. It would be nice to ensure a lighter installation by default.

One my side, there are two main concerns: 1 - You had to duplicate the .eng files in the repo. Isn't there any way wehre you don't need the new rocketpy/dataset folder, and instead simply download the data/ folder? 2 - Is there any particular reason for creating the new functions as utilities instead of methods of the GenericMotor class? (I'm not saying one option is better than other, I'm just trying to understand)

Gui-FernandesBR avatar May 06 '25 11:05 Gui-FernandesBR

@Gui-FernandesBR, ok, I'll search for a nice way to make it optional and discuss it in the weekly. I know that it will involve another member of the team that has access to the PyPI configuration.

Regarding your concerns:

1 - There are ways, but the team and I discussed in the weekly that creating the datasets directory would be a good choice because:

  • It separates what is meant to be used programmatically, as a resource (the datasets/ directory), from what is meant to be used in examples or sandbox exploration (the data/ directory), organizing it in a structured way (in this case, only having .eng files for motors, and nothing else);
  • It allows RocketPy to reference the dataset in other parts of the code using rocketpy.datasets. I believe it wouldn't be good to reference the data folder like that, since it's not inside the rocketpy directory;
  • It is inspired in other well-stablished projects, like the color palettes in seaborn, or the datasets from scikit-learn (sklearn.datasets).

2 - I thought it would fit better in the utilities because there may be future functions that also use the datasets directory, but are not necessarily related to the GenericMotor class. However, the fact that I've created the documentation inside the GenericMotor part in the user guide looks a little bit like an inconsistency haha, even though I thought it would be better for it to be there when thinking about its use case.

leogrosa avatar May 06 '25 15:05 leogrosa

Codecov Report

:x: Patch coverage is 79.31034% with 6 lines in your changes missing coverage. Please review. :white_check_mark: Project coverage is 80.13%. Comparing base (4df0b38) to head (25314de). :warning: Report is 60 commits behind head on develop.

Files with missing lines Patch % Lines
rocketpy/utilities.py 79.31% 6 Missing :warning:
Additional details and impacted files
@@             Coverage Diff             @@
##           develop     #814      +/-   ##
===========================================
+ Coverage    79.11%   80.13%   +1.02%     
===========================================
  Files           96       97       +1     
  Lines        11575    12049     +474     
===========================================
+ Hits          9158     9656     +498     
+ Misses        2417     2393      -24     

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

:rocket: New features to boost your workflow:
  • :snowflake: Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

codecov[bot] avatar May 16 '25 11:05 codecov[bot]

@leogrosa please try to make the dataset optional on the installation. We don't have anything of the sort in the repo right now so you will need to research a bit on how to do it

MateusStano avatar May 17 '25 10:05 MateusStano

@Gui-FernandesBR @MateusStano

So, I did some research and found that it's not possible to make part of this repository optional when installing via pip.

The only way to achieve that would be by hosting the files elsewhere. This could be done in many ways, but I believe the most scalable and maintainable option would be to create a separate repository under the RocketPy-Team organization. We could then include it as an optional dependency in RocketPy, using pip’s extras_require mechanism. That way, we avoid implementing our own logic for download handling, file hosting, retries, etc — all of which pip already handles very well.

We could create a repository named rocketpy-datasets, but that would lead to the same problem we're trying to avoid: increasing size over time. For instance, if a user wants to install only the motors dataset, they'd still have to download the entire dataset repository, including unrelated data (e.g., weather, atmospheric profiles...).

So, my proposal is to create a dedicated repository for each dataset. For example:

  • rocketpy-motor-dataset
  • rocketpy-weather-dataset
  • ...

This granularity would allow us to keep each dataset truly optional, and help prevent unnecessary package size growth in the long term.

The only downside is having to maintain multiple small repositories. However, since these are intended to store only static data files, setup should be straightforward. I can even prepare a private “dataset repository template” to simplify this process for future datasets.

What do you guys think?

leogrosa avatar May 17 '25 18:05 leogrosa

@Gui-FernandesBR @MateusStano

So, I did some research and found that it's not possible to make part of this repository optional when installing via pip.

The only way to achieve that would be by hosting the files elsewhere. This could be done in many ways, but I believe the most scalable and maintainable option would be to create a separate repository under the RocketPy-Team organization. We could then include it as an optional dependency in RocketPy, using pip’s extras_require mechanism. That way, we avoid implementing our own logic for download handling, file hosting, retries, etc — all of which pip already handles very well.

We could create a repository named rocketpy-datasets, but that would lead to the same problem we're trying to avoid: increasing size over time. For instance, if a user wants to install only the motors dataset, they'd still have to download the entire dataset repository, including unrelated data (e.g., weather, atmospheric profiles...).

So, my proposal is to create a dedicated repository for each dataset. For example:

  • rocketpy-motor-dataset
  • rocketpy-weather-dataset
  • ...

This granularity would allow us to keep each dataset truly optional, and help prevent unnecessary package size growth in the long term.

The only downside is having to maintain multiple small repositories. However, since these are intended to store only static data files, setup should be straightforward. I can even prepare a private “dataset repository template” to simplify this process for future datasets.

What do you guys think?

Thanks for the update, Leo. I need to think a bit more about the problem before answering you.

Gui-FernandesBR avatar May 18 '25 21:05 Gui-FernandesBR

We will create another repo to serve as a "dataset" for motors.

image

Gui-FernandesBR avatar Nov 09 '25 01:11 Gui-FernandesBR