ENH : improvement csv processing
Pull request type
- [X] Code changes (bugfix, features)
Checklist
- [ ] Tests for the changes have been added (if needed)
- [X] Docs have been reviewed and added / updated
- [ ] Lint (
black rocketpy/ tests/) has passed locally - [ ] All tests (
pytest --runslow) have passed locally - [ ]
CHANGELOG.mdhas been updated (if relevant)
Current behavior
The current code isn't able to deal with CSV files that contain a header.
New behavior
Now, when we create an instance of the Function Class, we can give for argument a CSV file with a header : the code will process this CSV file (drop NaN values) and will be able to deal with the header.
Breaking change
- [X] No
Additional information
Enter text here...
Could you elaborate a bit more on why is this implementation an advantage over the current one, forgive me if I am mistaken, but my interpretation is that:
- The current way the CSVs are read follows the
numpy.loadtxtin atry-exceptclause: numpy will try to convert the CSV to an array of floats, if there is an error, we skip the first header line. If there is still an error, the input is likely bad formatted. - This new implementation creates aditional functions to process the data: the headers will be checked by trying a
float()cast.
This seems to the same thing to me, but in the first case we can delegate to numpy the trouble of reading and not maintain these new functions.
We had recently a PR (#485 and #495) to handle CSV headers and there is even a test for it: test_func_from_csv_with_header.
Hello @phmbressan, this code is not only able to skip the first row but also to drop NaN values which it is not the case in the current code. Moreover, the current code is not very efficient dealing with these cases (from my perspective). Try with the following CSV file that @Gui-FernandesBR gave me. The current code throws an error whereas mine deals perfectly with it. Let me know if you want a complete report on this code! test_dataset_with_header.csv
tests not passing kinda worries me a bit but I couldn't find enough time to investigate it.
If the problem we are trying to solve is to read .csv files that have NaN values, I think we should consider Changning the np.loadtxt to the np.genfromtxt (see https://numpy.org/doc/stable/reference/generated/numpy.genfromtxt.html).
This function is a bit more complex and can
Relying in numpy may be more beneficial than using the full pythonic approach, both in terms of maintenance and speed. But this is something that can be discussed.
https://stackoverflow.com/questions/20245593/difference-between-numpy-genfromtxt-and-numpy-loadtxt-and-unpack
Closed as not finished. The branch is still available in the repo tho.