dataclasses-json
dataclasses-json copied to clipboard
[BUG] warning message dumps large amount of data to stdout
Description
We just ran into the case where the warnings logged out huge chunks of actual data into stdout. While I understand where this comes from, this is a bad idea because this can leak PII data into stdout/stderr by accident. Plus: The old behaviour was "just working as intended" for our usecase.
In our case, this was caused by this class: https://github.com/neuroforgede/nfcompose/blob/1ad30313e1bdbdb7c3d8e35fd74f905924e2003e/client/compose_client/library/models/definition/datapoint.py#L32
Note that we are using dataclass_json in serialization, but have extra code preventing the non primitive data reaching the actual serialization into json.
Example log (from python 3.8):
/home/<....>/integration/function_layout/compose/venv/lib/python3.8/site-packages/dataclasses_json/core.py:342: UserWarning: Failed to decode {'data': {'season': '', 'branchCombination': '',
Code snippet that reproduces the issue
# This Source Code Form is subject to the terms of the Mozilla Public License, v. 2.0.
# If a copy of the MPL was not distributed with this file,
# You can obtain one at https://mozilla.org/MPL/2.0/.
# This file is part of NF Compose
# [2019] - [2024] © NeuroForge GmbH & Co. KG
from dataclasses import dataclass, field
from typing import Any, Dict, Union, BinaryIO
from dataclasses_json import dataclass_json, Undefined
from compose_client.library.models.definition.data_series_definition import DataSeriesDefinition
from compose_client.library.models.identifiable import Identifiable
from compose_client.library.models.raw.datapoint import RawDataPoint
# explicitly no dataclass so we dont forget to implement a proper serializer
class FileTypeContent:
url: str
def __init__(self, url: str):
self.url = url
Primitive = Union[str, float, int, bool]
@dataclass_json(undefined=Undefined.EXCLUDE)
@dataclass
class DataPoint(Identifiable):
external_id: str
payload: Dict[str, Union[Primitive, FileTypeContent, BinaryIO]]
identify_dimensions_by_external_id: bool = field(default=True)
@staticmethod
def from_raw(raw: RawDataPoint, definition: DataSeriesDefinition) -> 'DataPoint':
payload = raw.payload.copy()
for file_like_fact in [*definition.structure.file_facts, *definition.structure.image_facts]:
if file_like_fact.external_id in payload:
payload[file_like_fact.external_id] = FileTypeContent(url=payload[file_like_fact.external_id])
return DataPoint(
external_id=raw.external_id,
payload=payload
)
def to_dict(self) -> Any: ...
@staticmethod
def from_dict(dict: Dict[str, Any]) -> 'DataPoint': ...
Describe the results you expected
The warning should not log out the data unprompted. This is a data security issue.
Python version you are using
Python 3.10.12
Environment description
certifi==2024.6.2 charset-normalizer==3.3.2 click==8.1.7 compose_client @ https://github.com/neuroforgede/nfcompose/releases/download/2.2.1/compose_client-2.2.1.tar.gz#sha256=08b5d99570e34734b1c5938c26cd57b456282443961d19a69754a096b8f8b14d dataclasses-json==0.6.7 idna==3.7 marshmallow==3.21.3 mypy-extensions==1.0.0 packaging==24.1 requests==2.32.3 typing-inspect==0.9.0 typing_extensions==4.12.2 urllib3==2.2.2