polars
polars copied to clipboard
Failure to read/scan ndjson file with faulty line with `ignore_errors=True`
Checks
- [X] I have checked that this issue has not already been reported.
- [X] I have confirmed this bug exists on the latest version of Polars.
Reproducible example
example.json
:
{"a": 1,"b": 3}
x
{"a": 4,"b": 2}
scan:
import polars as pl
pl.scan_ndjson('example.json', ignore_errors=True).collect()
read:
import polars as pl
pl.read_ndjson('example.json', ignore_errors=True)
Log output
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Users/nbrr/Library/Caches/pypoetry/virtualenvs/env-FCFMsks8-py3.11/lib/python3.11/site-packages/polars/utils/deprecation.py", line 133, in wrapper
return function(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/nbrr/Library/Caches/pypoetry/virtualenvs/env-FCFMsks8-py3.11/lib/python3.11/site-packages/polars/utils/deprecation.py", line 133, in wrapper
return function(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/nbrr/Library/Caches/pypoetry/virtualenvs/env-FCFMsks8-py3.11/lib/python3.11/site-packages/polars/io/ndjson.py", line 110, in scan_ndjson
return pl.LazyFrame._scan_ndjson(
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/nbrr/Library/Caches/pypoetry/virtualenvs/env-FCFMsks8-py3.11/lib/python3.11/site-packages/polars/lazyframe/frame.py", line 555, in _scan_ndjson
self._ldf = PyLazyFrame.new_from_ndjson(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
polars.exceptions.ComputeError: InternalError(TapeError) at character 0 ('x')
Issue description
Both read_ndjson
and scan_ndjson
fail to process a ndjson file with a line that is not proper json.
Expected behavior
File example.json
is read, ignoring the non-json line.
Installed versions
--------Version info---------
Polars: 0.20.4
Index type: UInt32
Platform: macOS-10.16-x86_64-i386-64bit
Python: 3.11.5 (main, Sep 11 2023, 08:19:27) [Clang 14.0.6 ]
----Optional dependencies----
adbc_driver_manager: <not installed>
cloudpickle: 3.0.0
connectorx: <not installed>
deltalake: <not installed>
fsspec: 2023.12.2
gevent: <not installed>
hvplot: <not installed>
matplotlib: <not installed>
numpy: 1.26.3
openpyxl: <not installed>
pandas: 2.1.4
pyarrow: 13.0.0
pydantic: <not installed>
pyiceberg: <not installed>
pyxlsb: <not installed>
sqlalchemy: <not installed>
xlsx2csv: <not installed>
xlsxwriter: <not installed>