langflow icon indicating copy to clipboard operation
langflow copied to clipboard

refactor: (codeflash) ⚡️ Speed up function `build_output_logs` by 79%

Open misrasaurabh1 opened this issue 1 year ago • 1 comments

📄 build_output_logs in src/backend/base/langflow/schema/schema.py

✨ Performance Summary:

  • Speed Increase: 📈 79% (0.79x faster)
  • Runtime Reduction: ⏱️ From 5.84 milliseconds down to 3.26 milliseconds (best of 28 runs)

📝 Explanation and details

Here is the optimized version of your Python program.

Changes and improvements.

  1. Consolidated some conditions within get_message and get_type functions for brevity and faster execution by reducing multiple checks.
  2. Minor conditional optimizations within the DataFrame class.
  3. Avoided unnecessary dictionary unpacking and update operations in build_output_logs.
  4. Some refactoring for consistency and readability.

Correctness verification

The new optimized code was tested for correctness. The results are listed below:

Test Status Details
⚙️ Existing Unit Tests 🔘 None Found
🌀 Generated Regression Tests 13 Passed See below
⏪ Replay Tests 🔘 None Found
🔎 Concolic Coverage Tests 🔘 None Found
📊 Coverage 84.6%

🌀 Generated Regression Tests Details

Click to view details
from collections.abc import Generator
from enum import Enum
from typing import cast

import pandas as pd
# imports
import pytest  # used for our unit tests
from langflow.schema.data import Data
from langflow.schema.dataframe import DataFrame
from langflow.schema.message import Message
from langflow.schema.schema import build_output_logs
from langflow.schema.serialize import recursive_serialize_or_str
from pandas import DataFrame as pandas_DataFrame
from pydantic import BaseModel
from typing_extensions import TypedDict

# unit tests

# Mock classes to simulate inputs
class MockVertex:
    def __init__(self, outputs):
        self.outputs = outputs

class MockComponentInstance:
    def __init__(self, status=None, _results=None, _artifacts=None):
        self.status = status
        self._results = _results or {}
        self._artifacts = _artifacts or {}

# Basic Functionality
def test_single_output_string_payload():
    vertex = MockVertex(outputs=[{"name": "output1"}])
    result = [MockComponentInstance(_results={"output1": "simple string"})]
    expected_output = {"output1": {"message": "simple string", "type": "text"}}
    codeflash_output = build_output_logs(vertex, result)

def test_multiple_outputs_simple_data_types():
    vertex = MockVertex(outputs=[{"name": "output1"}, {"name": "output2"}, {"name": "output3"}])
    result = [MockComponentInstance(_results={"output1": "string", "output2": {"key": "value"}, "output3": ["item1", "item2"]})]
    expected_output = {
        "output1": {"message": "string", "type": "text"},
        "output2": {"message": {"key": "value"}, "type": "object"},
        "output3": {"message": ["item1", "item2"], "type": "array"}
    }
    codeflash_output = build_output_logs(vertex, result)

# Handling Complex Data Types
def test_payload_dataframe():
    vertex = MockVertex(outputs=[{"name": "output1"}])
    df = DataFrame([{"col1": "val1", "col2": "val2"}])
    result = [MockComponentInstance(_results={"output1": df})]
    expected_output = {"output1": {"message": [{"col1": "val1", "col2": "val2"}], "type": "array"}}
    codeflash_output = build_output_logs(vertex, result)

def test_payload_message_object():
    vertex = MockVertex(outputs=[{"name": "output1"}])
    msg = Message(text="This is a message")
    result = [MockComponentInstance(_results={"output1": msg})]
    expected_output = {"output1": {"message": "This is a message", "type": "message"}}
    codeflash_output = build_output_logs(vertex, result)

# Handling Streams

def test_stream_with_url():
    vertex = MockVertex(outputs=[{"name": "output1"}])
    result = [MockComponentInstance(_results={"output1": {"stream_url": "http://example.com"}})]
    expected_output = {"output1": {"message": {"location": "http://example.com"}, "type": "stream"}}
    codeflash_output = build_output_logs(vertex, result)

# Handling Edge Cases
def test_empty_payload():
    vertex = MockVertex(outputs=[{"name": "output1"}])
    result = [MockComponentInstance(_results={"output1": None})]
    expected_output = {"output1": {"message": None, "type": "unknown"}}
    codeflash_output = build_output_logs(vertex, result)

def test_unknown_types():
    vertex = MockVertex(outputs=[{"name": "output1"}])
    result = [MockComponentInstance(_results={"output1": 12345})]
    expected_output = {"output1": {"message": 12345, "type": "unknown"}}
    codeflash_output = build_output_logs(vertex, result)

# Error Handling
def test_invalid_dataframe_initialization():
    vertex = MockVertex(outputs=[{"name": "output1"}])
    invalid_data = [Data(data={"key": "value"}), {"key": "value"}]  # Mixed types
    with pytest.raises(ValueError, match="List items must be either all Data objects or all dictionaries"):
        DataFrame(invalid_data)

# Large Scale Test Cases
def test_large_dataframe():
    vertex = MockVertex(outputs=[{"name": "output1"}])
    large_df = DataFrame([{"col1": f"val{i}", "col2": f"val{i}"} for i in range(1000)])
    result = [MockComponentInstance(_results={"output1": large_df})]
    expected_output = {"output1": {"message": large_df.to_dict(orient="records"), "type": "array"}}
    codeflash_output = build_output_logs(vertex, result)

def test_large_list_of_outputs():
    outputs = [{"name": f"output{i}"} for i in range(100)]
    vertex = MockVertex(outputs=outputs)
    result = [MockComponentInstance(_results={f"output{i}": f"value{i}" for i in range(100)})]
    expected_output = {f"output{i}": {"message": f"value{i}", "type": "text"} for i in range(100)}
    codeflash_output = build_output_logs(vertex, result)

# Special Cases
def test_status_handling():
    vertex = MockVertex(outputs=[{"name": "output1"}])
    result = [MockComponentInstance(status=None, _results={"output1": "result"}, _artifacts={})]
    expected_output = {"output1": {"message": "result", "type": "text"}}
    codeflash_output = build_output_logs(vertex, result)


def test_mixed_data_types_in_outputs():
    vertex = MockVertex(outputs=[{"name": "output1"}, {"name": "output2"}, {"name": "output3"}])
    result = [MockComponentInstance(_results={"output1": "text", "output2": {"key": "value"}, "output3": DataFrame([{"col1": "val1"}])})]
    expected_output = {
        "output1": {"message": "text", "type": "text"},
        "output2": {"message": {"key": "value"}, "type": "object"},
        "output3": {"message": [{"col1": "val1"}], "type": "array"}
    }
    codeflash_output = build_output_logs(vertex, result)
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.
from collections.abc import Generator
from enum import Enum
from typing import cast

import pandas as pd
# imports
import pytest  # used for our unit tests
from langflow.schema.data import Data
from langflow.schema.dataframe import DataFrame
from langflow.schema.message import Message
from langflow.schema.schema import build_output_logs
from langflow.schema.serialize import recursive_serialize_or_str
from pandas import DataFrame as pandas_DataFrame
from pydantic import BaseModel
from typing_extensions import TypedDict

# unit tests

📣 **Feedback**

If you have any feedback or need assistance, feel free to join our Discord community:

Discord

misrasaurabh1 avatar Dec 17 '24 23:12 misrasaurabh1

CodSpeed Performance Report

Merging #5324 will degrade performances by 26.19%

Comparing codeflash-ai:codeflash/optimize-build_output_logs-2024-12-11T11.39.45 (d1fc2cd) with main (9c23759)

Summary

❌ 2 regressions
✅ 13 untouched benchmarks

:warning: Please fix the performance issues or acknowledge them on CodSpeed.

Benchmarks breakdown

Benchmark main codeflash-ai:codeflash/optimize-build_output_logs-2024-12-11T11.39.45 Change
test_successful_run_with_input_type_any 256.2 ms 340.1 ms -24.68%
test_successful_run_with_output_type_any 238.5 ms 323.1 ms -26.19%

codspeed-hq[bot] avatar Dec 18 '24 00:12 codspeed-hq[bot]

Hi! I'm autofix logoautofix.ci, a bot that automatically fixes trivial issues such as code formatting in pull requests.

I would like to apply some automated changes to this pull request, but it looks like I don't have the necessary permissions to do so. To get this pull request into a mergeable state, please do one of the following two things:

  1. Allow edits by maintainers for your pull request, and then re-trigger CI (for example by pushing a new commit).
  2. Manually fix the issues identified for your pull request (see the GitHub Actions output for details on what I would like to change).

autofix-ci[bot] avatar Jan 09 '25 22:01 autofix-ci[bot]

@cbornet can we merge this approved PR?

misrasaurabh1 avatar Feb 10 '25 07:02 misrasaurabh1

@misrasaurabh1 There is an issue with starter projects. Can you check https://github.com/langflow-ai/langflow/pull/5324#issuecomment-2581342703 and resolve ?

cbornet avatar Feb 10 '25 13:02 cbornet