haystack icon indicating copy to clipboard operation
haystack copied to clipboard

Pipeline validation fails for PreProcessor initialised with strings containing spaces for the remove_substrings param

Open mathislucka opened this issue 3 years ago • 0 comments

Describe the bug When using the PreProcessor with the remove_substrings parameter with a configuration where the PreProcessor should remove substrings with a space in the string, pipeline validation fails if using load_from_config or load_from_yaml.

Error message

raise PipelineConfigError(
haystack.errors.PipelineConfigError: 'with space' is not a valid variable name or value. Use alphanumeric characters or dash, underscore and colon only.)

This is also true for any parameter value in custom nodes which is very limiting regarding what you can implement.

Expected behavior Pipeline validation does not care about the formatting of the values that are passed in. Checking variable names is fine but values shouldn't be checked.

Additional context

To Reproduce

from haystack.pipelines import Pipeline

if __name__ == '__main__':
    pipeline_cfg = {
        'version': '1.8',
        'components': [
            {
                'name': 'preprocessor',
                'type': 'PreProcessor',
                'params': {
                    'remove_substrings': ['with space']
                }
            }
        ],
        'pipelines': [
            {
                'name': 'indexing',
                'nodes': [
                    {'name': 'preprocessor', 'inputs': ['File']}
                ]
            }
        ]
    }

    pp = Pipeline.load_from_config(pipeline_cfg, pipeline_name='indexing')

FAQ Check

System:

  • OS:
  • GPU/CPU:
  • Haystack version (commit or version number): 1.8
  • DocumentStore:
  • Reader:
  • Retriever:

mathislucka avatar Sep 20 '22 14:09 mathislucka