RocketPy icon indicating copy to clipboard operation
RocketPy copied to clipboard

ENH : improvement csv processing

Open kalounis opened this issue 2 years ago • 4 comments

Pull request type

  • [X] Code changes (bugfix, features)

Checklist

  • [ ] Tests for the changes have been added (if needed)
  • [X] Docs have been reviewed and added / updated
  • [ ] Lint (black rocketpy/ tests/) has passed locally
  • [ ] All tests (pytest --runslow) have passed locally
  • [ ] CHANGELOG.md has been updated (if relevant)

Current behavior

The current code isn't able to deal with CSV files that contain a header.

New behavior

Now, when we create an instance of the Function Class, we can give for argument a CSV file with a header : the code will process this CSV file (drop NaN values) and will be able to deal with the header.

Breaking change

  • [X] No

Additional information

Enter text here...

kalounis avatar Dec 19 '23 20:12 kalounis

Could you elaborate a bit more on why is this implementation an advantage over the current one, forgive me if I am mistaken, but my interpretation is that:

  • The current way the CSVs are read follows the numpy.loadtxt in a try-except clause: numpy will try to convert the CSV to an array of floats, if there is an error, we skip the first header line. If there is still an error, the input is likely bad formatted.
  • This new implementation creates aditional functions to process the data: the headers will be checked by trying a float() cast.

This seems to the same thing to me, but in the first case we can delegate to numpy the trouble of reading and not maintain these new functions.

We had recently a PR (#485 and #495) to handle CSV headers and there is even a test for it: test_func_from_csv_with_header.

phmbressan avatar Dec 19 '23 21:12 phmbressan

Hello @phmbressan, this code is not only able to skip the first row but also to drop NaN values which it is not the case in the current code. Moreover, the current code is not very efficient dealing with these cases (from my perspective). Try with the following CSV file that @Gui-FernandesBR gave me. The current code throws an error whereas mine deals perfectly with it. Let me know if you want a complete report on this code! test_dataset_with_header.csv

kalounis avatar Dec 19 '23 21:12 kalounis

tests not passing kinda worries me a bit but I couldn't find enough time to investigate it.

Gui-FernandesBR avatar Dec 22 '23 01:12 Gui-FernandesBR

If the problem we are trying to solve is to read .csv files that have NaN values, I think we should consider Changning the np.loadtxt to the np.genfromtxt (see https://numpy.org/doc/stable/reference/generated/numpy.genfromtxt.html).

This function is a bit more complex and can

Relying in numpy may be more beneficial than using the full pythonic approach, both in terms of maintenance and speed. But this is something that can be discussed.

https://stackoverflow.com/questions/20245593/difference-between-numpy-genfromtxt-and-numpy-loadtxt-and-unpack

Gui-FernandesBR avatar Jan 25 '24 04:01 Gui-FernandesBR

Closed as not finished. The branch is still available in the repo tho.

Gui-FernandesBR avatar Jun 29 '24 15:06 Gui-FernandesBR