haystack
haystack copied to clipboard
Pipeline Env Var not converted to int properly (incl solution)
Describe the bug
When defining haystack-pipeline.yml components via docker compose's environment variables (using the naming convention COMPONENT_PARAMS_NAME
), integers are not parsed correctly.
Docker Compose:
services:
haystack-api:
build:
context: .
image: "deepset/haystack:gpu"
volumes:
- ~/haystack-pipelines:/opt/pipelines
expose:
- 8000
restart: no
environment:
- PIPELINE_YAML_PATH=/opt/pipelines/pipeline-managed.yml
# Haystack pipeline configs
- PREPROCESSOR_PARAMS_SPLIT_BY=${PREPROCESSOR_PARAMS_SPLIT_BY:-sentence} # split by sentence by default
- PREPROCESSOR_PARAMS_SPLIT_LENGTH=5
- DOCUMENTSTORE_PARAMS_INDEX
Error message
split_length
(defined via PREPROCESSOR_PARAMS_SPLIT_LENGTH
) is not correctly converted to integer. As it is coming as an env variable, it comes as a string unfortunately, throwing the following error:
File "/opt/venv/lib/python3.10/site-packages/haystack/nodes/preprocessor/preprocessor.py", line 579, in _concatenate_units
segments = windowed(elements, n=split_length, step=split_length - split_overlap)
TypeError: unsupported operand type(s) for -: 'str' and 'int'
...
Exception: Exception while running node 'Preprocessor': unsupported operand type(s) for -: 'str' and 'int'
Expected behavior
I believe this can be fixed by adding
if value.is_digit():
component_definition["params"][param_name] = int(value)
in haystack/pipelines/config.py
, line 84.
original code:
if overwrite_with_env_variables:
for key, value in os.environ.items():
env_prefix = f"{name}_params_".upper()
if key.startswith(env_prefix):
param_name = key.replace(env_prefix, "").lower()
if "params" not in component_definition:
component_definition["params"] = {}
component_definition["params"][param_name] = value
#
# new code here
#
logger.info(
"Param '%s' of component '%s' overwritten with environment variable '%s' value '%s'.",
param_name,
name,
key,
"***",
)
Additional context Defining this in the haystack-pipeline.yaml yields the correct results as yamls are able to parse int.
- name: Preprocessor
type: PreProcessor
params:
split_length: 5
split_overlap: 1
To Reproduce
FAQ Check
- [x] Have you had a look at our new FAQ page?
System:
- OS: Ubuntu
- GPU/CPU: GPU
- Haystack version (commit or version number): v1.16.1 from Docker hub
- DocumentStore: Elastic Search
- Reader: FARMReader (deepset/roberta-base-squad2)
- Retriever: EmbeddingRetriever (text-embedding-ada-002 Azure)
- Prompt: PromptNode (gpt-35-turbo Azure)
Hi @LisaGLH Thank you for reporting this and for suggesting how to fix it. Would you maybe like to open a pull request with your fix? That would be great! 🙂