tap-mongodb icon indicating copy to clipboard operation
tap-mongodb copied to clipboard

Writes broken schemas in 2.0.0

Open edgarrmondragon opened this issue 4 years ago • 1 comments

Switching to 2.0.0 I found that this tap outputs bad schema messages:

Original schema:

{
  "properties": {
    "_id": {
      "type": "string"
    },
    "student_id": {
      "type": "integer"
    },
    "class_id": {
      "type": "integer"
    },
    "scores": {
      "items": {
        "properties": {
          "type": {
            "type": "string"
          },
          "score": {
            "type": "number"
          }
        },
        "type": "object"
      },
      "type": "array"
    }
  },
  "type": "object",
  "additionalProperties": true
}

Schema output by tap:

{
  "type": "object",
  "properties": {
    "scores": {
      "anyOf": [
        {
          "type": "array",
          "items": {
            "anyOf": [
              {
                "type": "object",
                "properties": {
                  "score": {
                    "anyOf": [{"type": "number"}, {}]
                  }
                }
              },
              {}
            ]
          }
        },
        {}
      ]
    }
  }
}

All my records have the same schema, and they look like this:

{
  "_id": "50b59cd75bed76f46522c34e",
  "student_id": 0,
  "class_id": 2,
  "scores": [
    {
      "type": "exam",
      "score": 57.92947112575566
    },
    {
      "type": "quiz",
      "score": 21.24542588206755
    },
    {
      "type": "homework",
      "score": 68.1956781058743
    },
    {
      "type": "homework",
      "score": 67.95019716560351
    },
    {
      "type": "homework",
      "score": 18.81037253352722
    }
  ]
}

edgarrmondragon avatar Mar 18 '20 23:03 edgarrmondragon

Hi @edgarrmondragon -- the tap-mongodb schema generation was introduced in #40, and since MongoDB is NoSQL, it does not attempt to write a strict schema. Instead, it only writes a schema for date-times, decimals, and numbers as it sees them

This is intended to overcome problems that users were experiencing in certain targets where date-times were being written as strings, and doubles/decimals were sometimes being split into different columns depending on the precision of the value.

Also, I checked https://www.jsonschemavalidator.net/, and the schema that was generated looks like it validates against your record.

nick-mccoy avatar Mar 20 '20 13:03 nick-mccoy