mongo-spark Added parse mode support when reading data from MongoDB.

Added parse mode support when reading data from MongoDB.

Open rozza opened this issue 8 months ago • 0 comments

Adds the mode configuration allowing for different parsing strategies when handling documents that don't match the expected schema during reads.

The options are:

FAILFAST (default) throw an exception when parsing a document that doesn't match the schema.
PERMISSIVE Sets any invalid fields to null. Combine with the columnNameOfCorruptRecord configuration if you want to store any invalid documents as an extended json string.
DROPMALFORMED ignores the whole document.

Adds the columnNameOfCorruptRecord configuration whic extends the PERMISSIVE mode. When configured it saves the whole invalid document as extended json in that column, as long as its defined in the Schema. Inferred schemas will add the columnNameOfCorruptRecord column if set and the mode is PERMISSIVE.

Note: Names derive from existing spark json configurations, from where this feature takes inspiration.

SPARK-327

Jun 12 '24 11:06 rozza

mongo-spark mongo-spark copied to clipboard

Added parse mode support when reading data from MongoDB.

mongo-spark
mongo-spark copied to clipboard