Exception: Assert_failure ("src/base/misc/owl_dataframe.ml", 753, 2).
I was checking out the dataframe examples and I rewrote the first function just a bit to deal with some price data from Binance (11 columns; all floats) and I ran that function using a file of ETC/BTC price data with 1.2 million lines and it returned:
Exception: Assert_failure ("src/base/misc/owl_dataframe.ml", 753, 2).
I also run the function with a different (much smaller) file of price data and it worked just fine.
The function:
let csv_parser path =
let fname = path ^ "ETC-BTC.csv" in
let types = [|"f";"f";"f";"f";"f";"f";"f";"f";"f";"f";"f"|] in
let df = Dataframe.of_csv ~sep:',' ~types fname in
Owl_pretty.pp_dataframe Format.std_formatter df
Here's a sample of the price data:
open_time,open,high,low,close,volume,close_time,quote_asset_volume,number_of_trades,taker_buy_base_asset_volume,taker_buy_quote_asset_volume,ignore
1507800720000,0.00223,0.00223,0.00223,0.00223,10.0,1507800779999,0.0223,1,10.0,0.0223,900.48
What should I be doing differently in order to use this with larger files?
We should add a task to remove all asserts from the code. My guess is that there is a row with a different length. Can you check if I am wrong?
Consider using https://github.com/Chris00/ocaml-csv
Looks like the assertion fails exactly when the csv has >100 lines.
This issue is quite old and the code might've changed since then however 'assertion when CSV > 100 lines' is very similar to the bug I fixed here: https://github.com/owlbarn/owl/pull/639. In that case the assertion was used for control flow, but printed to the console and ignored (the code to guess the CSV separator and types wants to stop iterating on lines once >100 lines in a file). Replacing the assert with a proper exception makes the assert go away.
Could you check whether my PR fixes your problem?