langflow icon indicating copy to clipboard operation
langflow copied to clipboard

⚡️ Speed up method `MockEmbeddings.embed_documents` by 9% in `src/backend/tests/integration/utils.py`

Open codeflash-ai[bot] opened this issue 1 year ago • 2 comments

📄 MockEmbeddings.embed_documents() in src/backend/tests/integration/utils.py

📈 Performance improved by 9% (0.09x faster)

⏱️ Runtime went down from 1.08 millisecond to 997 microseconds

Explanation and details

To optimize your code for better performance, there are a few areas where improvements can be made. However, with the information provided and without changing the function signature or logic, the most effective changes focus on optimizing the list comprehension in the embed_documents method.

Here's the refactored version of the provided code.

Explanation of Changes.

  1. Caching Method Lookup: The single most effective change here is caching the self.mock_embedding method lookup by assigning it to a local variable mock_embedding before using it in the list comprehension. This reduces repeated attribute look-up overhead.

This simple adjustment ensures your code runs faster, especially if the list texts is large.

If mock_embedding implements a non-trivial operation or if there's an opportunity to optimize it further (e.g., by using vectorization or other algorithmic improvements), those details would be crucial for additional enhancements. Without more details about the implementation of mock_embedding, this remains the primary focus for optimizing runtime.

Correctness verification

The new optimized code was tested for correctness. The results are listed below.

🔘 (none found) − ⚙️ Existing Unit Tests

✅ 23 Passed − 🌀 Generated Regression Tests

(click to show generated tests)
# imports
from typing import List

import pytest  # used for our unit tests
# function to test
from langflow.field_typing import Embeddings
from src.backend.tests.integration.utils import MockEmbeddings

# unit tests

@pytest.fixture
def mock_embeddings():
    return MockEmbeddings()
    # Outputs were verified to be equal to the original implementation

def test_single_document(mock_embeddings):
    # Test embedding a single document
    codeflash_output = mock_embeddings.embed_documents(["Hello world"])
    # Outputs were verified to be equal to the original implementation

def test_multiple_documents(mock_embeddings):
    # Test embedding multiple documents
    codeflash_output = mock_embeddings.embed_documents(["Hello world", "Test document"])
    # Outputs were verified to be equal to the original implementation

def test_empty_list(mock_embeddings):
    # Test embedding an empty list
    codeflash_output = mock_embeddings.embed_documents([])
    # Outputs were verified to be equal to the original implementation

def test_very_long_document(mock_embeddings):
    # Test embedding a very long document
    codeflash_output = mock_embeddings.embed_documents(["a" * 10000])
    # Outputs were verified to be equal to the original implementation

def test_special_characters(mock_embeddings):
    # Test embedding documents with special characters
    codeflash_output = mock_embeddings.embed_documents(["Hello, world!", "Test@document#"])
    # Outputs were verified to be equal to the original implementation

def test_unicode_characters(mock_embeddings):
    # Test embedding documents with unicode characters
    codeflash_output = mock_embeddings.embed_documents(["こんにちは", "你好", "안녕하세요"])
    # Outputs were verified to be equal to the original implementation

def test_large_number_of_documents(mock_embeddings):
    # Test embedding a large number of documents
    texts = ["Doc" + str(i) for i in range(10000)]
    codeflash_output = mock_embeddings.embed_documents(texts)
    # Outputs were verified to be equal to the original implementation

def test_large_documents(mock_embeddings):
    # Test embedding large documents
    texts = ["a" * 10000 for _ in range(100)]
    codeflash_output = mock_embeddings.embed_documents(texts)
    # Outputs were verified to be equal to the original implementation

def test_non_string_elements(mock_embeddings):
    # Test embedding with non-string elements in the list
    with pytest.raises(TypeError):
        mock_embeddings.embed_documents([123, None, ["nested", "list"], {"key": "value"}])
    # Outputs were verified to be equal to the original implementation

def test_state_mutation(mock_embeddings):
    # Test that the state is correctly mutated
    texts = ["Hello world", "Test document"]
    mock_embeddings.embed_documents(texts)
    # Outputs were verified to be equal to the original implementation

def test_mixed_type_elements(mock_embeddings):
    # Test embedding with mixed type elements in the list
    with pytest.raises(TypeError):
        mock_embeddings.embed_documents(["Valid string", 123, None, "Another string"])
    # Outputs were verified to be equal to the original implementation

def test_control_characters(mock_embeddings):
    # Test embedding documents with control characters
    codeflash_output = mock_embeddings.embed_documents(["\x00\x01\x02", "\x7F\x80\x81"])
    # Outputs were verified to be equal to the original implementation

def test_combining_characters(mock_embeddings):
    # Test embedding documents with combining characters
    codeflash_output = mock_embeddings.embed_documents(["e\u0301", "a\u0302"])
    # Outputs were verified to be equal to the original implementation

def test_escape_sequences(mock_embeddings):
    # Test embedding documents with escape sequences
    codeflash_output = mock_embeddings.embed_documents(["Line1\nLine2", "Tab\tSeparated"])
    # Outputs were verified to be equal to the original implementation

def test_backslashes(mock_embeddings):
    # Test embedding documents with backslashes
    codeflash_output = mock_embeddings.embed_documents(["Backslash\\Test", "Double\\\\Backslash"])
    # Outputs were verified to be equal to the original implementation

def test_empty_string(mock_embeddings):
    # Test embedding an empty string
    codeflash_output = mock_embeddings.embed_documents([""])
    # Outputs were verified to be equal to the original implementation

def test_single_character(mock_embeddings):
    # Test embedding single character documents
    codeflash_output = mock_embeddings.embed_documents(["a", "b", "c"])
    # Outputs were verified to be equal to the original implementation

def test_extremely_long_string(mock_embeddings):
    # Test embedding an extremely long string
    codeflash_output = mock_embeddings.embed_documents(["a" * 1000000])
    # Outputs were verified to be equal to the original implementation

def test_utf16_encoded_string(mock_embeddings):
    # Test embedding a UTF-16 encoded string
    utf16_string = b'\xff\xfeH\x00e\x00l\x00l\x00o\x00'.decode('utf-16')
    codeflash_output = mock_embeddings.embed_documents([utf16_string])
    # Outputs were verified to be equal to the original implementation

def test_utf32_encoded_string(mock_embeddings):
    # Test embedding a UTF-32 encoded string
    utf32_string = b'\xff\xfe\x00\x00H\x00\x00\x00e\x00\x00\x00l\x00\x00\x00l\x00\x00\x00o\x00\x00\x00'.decode('utf-32')
    codeflash_output = mock_embeddings.embed_documents([utf32_string])
    # Outputs were verified to be equal to the original implementation

def test_surrogate_pairs(mock_embeddings):
    # Test embedding documents with surrogate pairs
    codeflash_output = mock_embeddings.embed_documents(["\uD83D\uDE00", "\uD83D\uDC36"])
    # Outputs were verified to be equal to the original implementation

def test_non_printable_ascii(mock_embeddings):
    # Test embedding documents with non-printable ASCII characters
    codeflash_output = mock_embeddings.embed_documents(["\x01\x02\x03", "\x0E\x0F\x10"])
    # Outputs were verified to be equal to the original implementation

def test_non_printable_unicode(mock_embeddings):
    # Test embedding documents with non-printable Unicode characters
    codeflash_output = mock_embeddings.embed_documents(["\u200B\u200C\u200D", "\uFEFF"])
    # Outputs were verified to be equal to the original implementation

🔘 (none found) − ⏪ Replay Tests

codeflash-ai[bot] avatar Aug 02 '24 18:08 codeflash-ai[bot]

Pull Request Validation Report

This comment is automatically generated by Conventional PR

Whitelist Report

Whitelist Active Result
Pull request is submitted by a bot and should be ignored
Pull request is a draft and should be ignored
Pull request is made by a whitelisted user and should be ignored
Pull request is submitted by administrators and should be ignored

Result

Pull request matches with one (or more) enabled whitelist criteria. Pull request validation is skipped.

Last Modified at 02 Aug 24 18:11 UTC

github-actions[bot] avatar Aug 02 '24 18:08 github-actions[bot]

This pull request is automatically being deployed by Amplify Hosting (learn more).

Access this pull request here: https://pr-3176.dmtpw4p5recq1.amplifyapp.com

This PR has been automatically closed because the original PR #3216 by EvgenyK1 was closed.

codeflash-ai[bot] avatar Aug 06 '24 14:08 codeflash-ai[bot]