mongo-spark
mongo-spark copied to clipboard
Added parse mode support when reading data from MongoDB.
Adds the mode
configuration allowing for different parsing strategies when handling documents that don't match the expected schema during reads.
The options are:
-
FAILFAST
(default) throw an exception when parsing a document that doesn't match the schema. -
PERMISSIVE
Sets any invalid fields tonull
. Combine with thecolumnNameOfCorruptRecord
configuration if you want to store any invalid documents as an extended json string. -
DROPMALFORMED
ignores the whole document.
Adds the columnNameOfCorruptRecord
configuration whic extends the PERMISSIVE
mode. When configured it saves the whole invalid document as extended json in that column, as long as its defined in the Schema. Inferred schemas will add the columnNameOfCorruptRecord
column if set and the mode
is PERMISSIVE
.
Note: Names derive from existing spark json configurations, from where this feature takes inspiration.
SPARK-327