RocketPy ENH : improvement csv processing

Pull request type

[X] Code changes (bugfix, features)

Checklist

[ ] Tests for the changes have been added (if needed)
[X] Docs have been reviewed and added / updated
[ ] Lint (black rocketpy/ tests/) has passed locally
[ ] All tests (pytest --runslow) have passed locally
[ ] CHANGELOG.md has been updated (if relevant)

Current behavior

The current code isn't able to deal with CSV files that contain a header.

New behavior

Now, when we create an instance of the Function Class, we can give for argument a CSV file with a header : the code will process this CSV file (drop NaN values) and will be able to deal with the header.

Breaking change

[X] No

Additional information

Enter text here...

Dec 19 '23 20:12 kalounis

Could you elaborate a bit more on why is this implementation an advantage over the current one, forgive me if I am mistaken, but my interpretation is that:

The current way the CSVs are read follows the numpy.loadtxt in a try-except clause: numpy will try to convert the CSV to an array of floats, if there is an error, we skip the first header line. If there is still an error, the input is likely bad formatted.
This new implementation creates aditional functions to process the data: the headers will be checked by trying a float() cast.

This seems to the same thing to me, but in the first case we can delegate to numpy the trouble of reading and not maintain these new functions.

We had recently a PR (#485 and #495) to handle CSV headers and there is even a test for it: test_func_from_csv_with_header.

Dec 19 '23 21:12 phmbressan

Hello @phmbressan, this code is not only able to skip the first row but also to drop NaN values which it is not the case in the current code. Moreover, the current code is not very efficient dealing with these cases (from my perspective). Try with the following CSV file that @Gui-FernandesBR gave me. The current code throws an error whereas mine deals perfectly with it. Let me know if you want a complete report on this code! test_dataset_with_header.csv

Dec 19 '23 21:12 kalounis

tests not passing kinda worries me a bit but I couldn't find enough time to investigate it.

Dec 22 '23 01:12 Gui-FernandesBR

If the problem we are trying to solve is to read .csv files that have NaN values, I think we should consider Changning the np.loadtxt to the np.genfromtxt (see https://numpy.org/doc/stable/reference/generated/numpy.genfromtxt.html).

This function is a bit more complex and can

Relying in numpy may be more beneficial than using the full pythonic approach, both in terms of maintenance and speed. But this is something that can be discussed.

https://stackoverflow.com/questions/20245593/difference-between-numpy-genfromtxt-and-numpy-loadtxt-and-unpack

Jan 25 '24 04:01 Gui-FernandesBR

Closed as not finished. The branch is still available in the repo tho.

Jun 29 '24 15:06 Gui-FernandesBR