dataclasses-json icon indicating copy to clipboard operation
dataclasses-json copied to clipboard

[BUG] warning message dumps large amount of data to stdout

Open s4ke opened this issue 3 weeks ago • 6 comments

Description

We just ran into the case where the warnings logged out huge chunks of actual data into stdout. While I understand where this comes from, this is a bad idea because this can leak PII data into stdout/stderr by accident. Plus: The old behaviour was "just working as intended" for our usecase.

In our case, this was caused by this class: https://github.com/neuroforgede/nfcompose/blob/1ad30313e1bdbdb7c3d8e35fd74f905924e2003e/client/compose_client/library/models/definition/datapoint.py#L32

Note that we are using dataclass_json in serialization, but have extra code preventing the non primitive data reaching the actual serialization into json.

Example log (from python 3.8):

/home/<....>/integration/function_layout/compose/venv/lib/python3.8/site-packages/dataclasses_json/core.py:342: UserWarning: Failed to decode {'data': {'season': '', 'branchCombination': '',

Code snippet that reproduces the issue

# This Source Code Form is subject to the terms of the Mozilla Public License, v. 2.0. 
# If a copy of the MPL was not distributed with this file, 
# You can obtain one at https://mozilla.org/MPL/2.0/.
# This file is part of NF Compose
# [2019] - [2024] © NeuroForge GmbH & Co. KG

from dataclasses import dataclass, field
from typing import Any, Dict, Union, BinaryIO

from dataclasses_json import dataclass_json, Undefined

from compose_client.library.models.definition.data_series_definition import DataSeriesDefinition
from compose_client.library.models.identifiable import Identifiable
from compose_client.library.models.raw.datapoint import RawDataPoint


# explicitly no dataclass so we dont forget to implement a proper serializer
class FileTypeContent:
    url: str

    def __init__(self, url: str):
        self.url = url


Primitive = Union[str, float, int, bool]


@dataclass_json(undefined=Undefined.EXCLUDE)
@dataclass
class DataPoint(Identifiable):
    external_id: str
    payload: Dict[str, Union[Primitive, FileTypeContent, BinaryIO]]
    identify_dimensions_by_external_id: bool = field(default=True)

    @staticmethod
    def from_raw(raw: RawDataPoint, definition: DataSeriesDefinition) -> 'DataPoint':
        payload = raw.payload.copy()
        for file_like_fact in [*definition.structure.file_facts, *definition.structure.image_facts]:
            if file_like_fact.external_id in payload:
                payload[file_like_fact.external_id] = FileTypeContent(url=payload[file_like_fact.external_id])
        return DataPoint(
            external_id=raw.external_id,
            payload=payload
        )

    def to_dict(self) -> Any: ...

    @staticmethod
    def from_dict(dict: Dict[str, Any]) -> 'DataPoint': ...

Describe the results you expected

The warning should not log out the data unprompted. This is a data security issue.

Python version you are using

Python 3.10.12

Environment description

certifi==2024.6.2 charset-normalizer==3.3.2 click==8.1.7 compose_client @ https://github.com/neuroforgede/nfcompose/releases/download/2.2.1/compose_client-2.2.1.tar.gz#sha256=08b5d99570e34734b1c5938c26cd57b456282443961d19a69754a096b8f8b14d dataclasses-json==0.6.7 idna==3.7 marshmallow==3.21.3 mypy-extensions==1.0.0 packaging==24.1 requests==2.32.3 typing-inspect==0.9.0 typing_extensions==4.12.2 urllib3==2.2.2

s4ke avatar Jun 20 '24 14:06 s4ke