haystack icon indicating copy to clipboard operation
haystack copied to clipboard

Config validation for PreProcessor's split_by parameter fails when set to null

Open E-dC opened this issue 2 years ago • 1 comments

Describe the bug Validation fails on PreProcessor component in a YAML config, or in a config dictionary, if the parameter split_by is set to null/None.

The parameter split_by is meant to accept the values: "word", "sentence", "passage" or None (to disable splitting).

Error message

ValidationError: {'name': 'my_preprocessor', 'type': 'PreProcessor', 'params': {'split_by': None}} is not valid under any of the given schemas

Expected behavior No validation error should occur when split_by is set to null/None.

Additional context I fixed the problem locally by adding null to the split_by param's enum:

"split_by": {
  "title": "Split By",
  "default": "word",
  "enum": [
    "word",
    "sentence",
    "passage",
    null
  ],
  "anyOf": [
    {
      "type": "string"
    },
    {
      "type": "null"
    }
  ]
}

I've briefly looked but haven't found other parameters on other components where this bug exists.

To Reproduce Run this code:

from haystack.pipelines.utils import validate_schema

CONF = {
    'version': '1.17.2',
    'components': [
        {'name': 'document_store', 'type': 'InMemoryDocumentStore'},
        {'name': 'json_converter', 'type': 'JsonConverter'},
        {
            'name': 'my_preprocessor',
            'type': 'PreProcessor',
            'params': {'split_by': None}
        }
    ],
    'pipelines': [
        {
          'name': 'indexing',
              'nodes': [
                  {'name': 'json_converter', 'inputs': ['File']},
                  {'name': 'my_preprocessor', 'inputs': ['json_converter']},
                  {'name': 'document_store', 'inputs': ['my_preprocessor']}
              ]
        }
    ]
}

validate_schema(CONF)

FAQ Check

System:

  • OS: Linux
  • GPU/CPU:
  • Haystack version (commit or version number): 1.17.2
  • DocumentStore: InMemoryDocumentStore
  • Reader: NA
  • Retriever: NA

E-dC avatar Jun 22 '23 15:06 E-dC

Hey @E-dC could you try to reproduce this with the latest version? If it works I'd consider it fixed.

silvanocerza avatar Jul 05 '23 17:07 silvanocerza