langflow refactor: (codeflash) ⚡️ Speed up function `build_output

📄 `build_output_logs` in `src/backend/base/langflow/schema/schema.py`

✨ Performance Summary:

Speed Increase: 📈 79% (0.79x faster)
Runtime Reduction: ⏱️ From 5.84 milliseconds down to 3.26 milliseconds (best of 28 runs)

📝 Explanation and details

Here is the optimized version of your Python program.

Changes and improvements.

Consolidated some conditions within get_message and get_type functions for brevity and faster execution by reducing multiple checks.
Minor conditional optimizations within the DataFrame class.
Avoided unnecessary dictionary unpacking and update operations in build_output_logs.
Some refactoring for consistency and readability.

✅ Correctness verification

The new optimized code was tested for correctness. The results are listed below:

Test	Status	Details
⚙️ Existing Unit Tests	🔘 None Found
🌀 Generated Regression Tests	✅ 13 Passed	See below
⏪ Replay Tests	🔘 None Found
🔎 Concolic Coverage Tests	🔘 None Found
📊 Coverage	84.6%

🌀 Generated Regression Tests Details

Click to view details

from collections.abc import Generator
from enum import Enum
from typing import cast

import pandas as pd
# imports
import pytest  # used for our unit tests
from langflow.schema.data import Data
from langflow.schema.dataframe import DataFrame
from langflow.schema.message import Message
from langflow.schema.schema import build_output_logs
from langflow.schema.serialize import recursive_serialize_or_str
from pandas import DataFrame as pandas_DataFrame
from pydantic import BaseModel
from typing_extensions import TypedDict

# unit tests

# Mock classes to simulate inputs
class MockVertex:
    def __init__(self, outputs):
        self.outputs = outputs

class MockComponentInstance:
    def __init__(self, status=None, _results=None, _artifacts=None):
        self.status = status
        self._results = _results or {}
        self._artifacts = _artifacts or {}

# Basic Functionality
def test_single_output_string_payload():
    vertex = MockVertex(outputs=[{"name": "output1"}])
    result = [MockComponentInstance(_results={"output1": "simple string"})]
    expected_output = {"output1": {"message": "simple string", "type": "text"}}
    codeflash_output = build_output_logs(vertex, result)

def test_multiple_outputs_simple_data_types():
    vertex = MockVertex(outputs=[{"name": "output1"}, {"name": "output2"}, {"name": "output3"}])
    result = [MockComponentInstance(_results={"output1": "string", "output2": {"key": "value"}, "output3": ["item1", "item2"]})]
    expected_output = {
        "output1": {"message": "string", "type": "text"},
        "output2": {"message": {"key": "value"}, "type": "object"},
        "output3": {"message": ["item1", "item2"], "type": "array"}
    }
    codeflash_output = build_output_logs(vertex, result)

# Handling Complex Data Types
def test_payload_dataframe():
    vertex = MockVertex(outputs=[{"name": "output1"}])
    df = DataFrame([{"col1": "val1", "col2": "val2"}])
    result = [MockComponentInstance(_results={"output1": df})]
    expected_output = {"output1": {"message": [{"col1": "val1", "col2": "val2"}], "type": "array"}}
    codeflash_output = build_output_logs(vertex, result)

def test_payload_message_object():
    vertex = MockVertex(outputs=[{"name": "output1"}])
    msg = Message(text="This is a message")
    result = [MockComponentInstance(_results={"output1": msg})]
    expected_output = {"output1": {"message": "This is a message", "type": "message"}}
    codeflash_output = build_output_logs(vertex, result)

# Handling Streams

def test_stream_with_url():
    vertex = MockVertex(outputs=[{"name": "output1"}])
    result = [MockComponentInstance(_results={"output1": {"stream_url": "http://example.com"}})]
    expected_output = {"output1": {"message": {"location": "http://example.com"}, "type": "stream"}}
    codeflash_output = build_output_logs(vertex, result)

# Handling Edge Cases
def test_empty_payload():
    vertex = MockVertex(outputs=[{"name": "output1"}])
    result = [MockComponentInstance(_results={"output1": None})]
    expected_output = {"output1": {"message": None, "type": "unknown"}}
    codeflash_output = build_output_logs(vertex, result)

def test_unknown_types():
    vertex = MockVertex(outputs=[{"name": "output1"}])
    result = [MockComponentInstance(_results={"output1": 12345})]
    expected_output = {"output1": {"message": 12345, "type": "unknown"}}
    codeflash_output = build_output_logs(vertex, result)

# Error Handling
def test_invalid_dataframe_initialization():
    vertex = MockVertex(outputs=[{"name": "output1"}])
    invalid_data = [Data(data={"key": "value"}), {"key": "value"}]  # Mixed types
    with pytest.raises(ValueError, match="List items must be either all Data objects or all dictionaries"):
        DataFrame(invalid_data)

# Large Scale Test Cases
def test_large_dataframe():
    vertex = MockVertex(outputs=[{"name": "output1"}])
    large_df = DataFrame([{"col1": f"val{i}", "col2": f"val{i}"} for i in range(1000)])
    result = [MockComponentInstance(_results={"output1": large_df})]
    expected_output = {"output1": {"message": large_df.to_dict(orient="records"), "type": "array"}}
    codeflash_output = build_output_logs(vertex, result)

def test_large_list_of_outputs():
    outputs = [{"name": f"output{i}"} for i in range(100)]
    vertex = MockVertex(outputs=outputs)
    result = [MockComponentInstance(_results={f"output{i}": f"value{i}" for i in range(100)})]
    expected_output = {f"output{i}": {"message": f"value{i}", "type": "text"} for i in range(100)}
    codeflash_output = build_output_logs(vertex, result)

# Special Cases
def test_status_handling():
    vertex = MockVertex(outputs=[{"name": "output1"}])
    result = [MockComponentInstance(status=None, _results={"output1": "result"}, _artifacts={})]
    expected_output = {"output1": {"message": "result", "type": "text"}}
    codeflash_output = build_output_logs(vertex, result)


def test_mixed_data_types_in_outputs():
    vertex = MockVertex(outputs=[{"name": "output1"}, {"name": "output2"}, {"name": "output3"}])
    result = [MockComponentInstance(_results={"output1": "text", "output2": {"key": "value"}, "output3": DataFrame([{"col1": "val1"}])})]
    expected_output = {
        "output1": {"message": "text", "type": "text"},
        "output2": {"message": {"key": "value"}, "type": "object"},
        "output3": {"message": [{"col1": "val1"}], "type": "array"}
    }
    codeflash_output = build_output_logs(vertex, result)
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.
from collections.abc import Generator
from enum import Enum
from typing import cast

import pandas as pd
# imports
import pytest  # used for our unit tests
from langflow.schema.data import Data
from langflow.schema.dataframe import DataFrame
from langflow.schema.message import Message
from langflow.schema.schema import build_output_logs
from langflow.schema.serialize import recursive_serialize_or_str
from pandas import DataFrame as pandas_DataFrame
from pydantic import BaseModel
from typing_extensions import TypedDict

# unit tests

📣 **Feedback**

If you have any feedback or need assistance, feel free to join our Discord community:

Dec 17 '24 23:12 misrasaurabh1

CodSpeed Performance Report

Merging #5324 will degrade performances by 26.19%

_{Comparing codeflash-ai:codeflash/optimize-build_output_logs-2024-12-11T11.39.45 (d1fc2cd) with main (9c23759)}

Summary

❌ 2 regressions
✅ 13 untouched benchmarks

:warning: Please fix the performance issues or acknowledge them on CodSpeed.

Benchmarks breakdown

	Benchmark	`main`	`codeflash-ai:codeflash/optimize-build_output_logs-2024-12-11T11.39.45`	Change
❌	`test_successful_run_with_input_type_any`	256.2 ms	340.1 ms	-24.68%
❌	`test_successful_run_with_output_type_any`	238.5 ms	323.1 ms	-26.19%

Dec 18 '24 00:12 codspeed-hq[bot]

Hi! I'm autofix.ci, a bot that automatically fixes trivial issues such as code formatting in pull requests.

I would like to apply some automated changes to this pull request, but it looks like I don't have the necessary permissions to do so. To get this pull request into a mergeable state, please do one of the following two things:

Allow edits by maintainers for your pull request, and then re-trigger CI (for example by pushing a new commit).
Manually fix the issues identified for your pull request (see the GitHub Actions output for details on what I would like to change).

Jan 09 '25 22:01 autofix-ci[bot]

@cbornet can we merge this approved PR?

Feb 10 '25 07:02 misrasaurabh1

@misrasaurabh1 There is an issue with starter projects. Can you check https://github.com/langflow-ai/langflow/pull/5324#issuecomment-2581342703 and resolve ?

Feb 10 '25 13:02 cbornet

refactor: (codeflash) ⚡️ Speed up function `build_output_logs` by 79%

📄 build_output_logs in src/backend/base/langflow/schema/schema.py

✨ Performance Summary:

📝 Explanation and details

✅ Correctness verification

🌀 Generated Regression Tests Details

CodSpeed Performance Report

Merging #5324 will degrade performances by 26.19%

Summary

Benchmarks breakdown

📄 `build_output_logs` in `src/backend/base/langflow/schema/schema.py`