langflow
langflow copied to clipboard
⚡️ Speed up method `MockEmbeddings.embed_documents` by 9% in `src/backend/tests/integration/utils.py`
📄 MockEmbeddings.embed_documents() in src/backend/tests/integration/utils.py
📈 Performance improved by 9% (0.09x faster)
⏱️ Runtime went down from 1.08 millisecond to 997 microseconds
Explanation and details
To optimize your code for better performance, there are a few areas where improvements can be made. However, with the information provided and without changing the function signature or logic, the most effective changes focus on optimizing the list comprehension in the embed_documents method.
Here's the refactored version of the provided code.
Explanation of Changes.
- Caching Method Lookup: The single most effective change here is caching the
self.mock_embeddingmethod lookup by assigning it to a local variablemock_embeddingbefore using it in the list comprehension. This reduces repeated attribute look-up overhead.
This simple adjustment ensures your code runs faster, especially if the list texts is large.
If mock_embedding implements a non-trivial operation or if there's an opportunity to optimize it further (e.g., by using vectorization or other algorithmic improvements), those details would be crucial for additional enhancements. Without more details about the implementation of mock_embedding, this remains the primary focus for optimizing runtime.
Correctness verification
The new optimized code was tested for correctness. The results are listed below.
🔘 (none found) − ⚙️ Existing Unit Tests
✅ 23 Passed − 🌀 Generated Regression Tests
(click to show generated tests)
# imports
from typing import List
import pytest # used for our unit tests
# function to test
from langflow.field_typing import Embeddings
from src.backend.tests.integration.utils import MockEmbeddings
# unit tests
@pytest.fixture
def mock_embeddings():
return MockEmbeddings()
# Outputs were verified to be equal to the original implementation
def test_single_document(mock_embeddings):
# Test embedding a single document
codeflash_output = mock_embeddings.embed_documents(["Hello world"])
# Outputs were verified to be equal to the original implementation
def test_multiple_documents(mock_embeddings):
# Test embedding multiple documents
codeflash_output = mock_embeddings.embed_documents(["Hello world", "Test document"])
# Outputs were verified to be equal to the original implementation
def test_empty_list(mock_embeddings):
# Test embedding an empty list
codeflash_output = mock_embeddings.embed_documents([])
# Outputs were verified to be equal to the original implementation
def test_very_long_document(mock_embeddings):
# Test embedding a very long document
codeflash_output = mock_embeddings.embed_documents(["a" * 10000])
# Outputs were verified to be equal to the original implementation
def test_special_characters(mock_embeddings):
# Test embedding documents with special characters
codeflash_output = mock_embeddings.embed_documents(["Hello, world!", "Test@document#"])
# Outputs were verified to be equal to the original implementation
def test_unicode_characters(mock_embeddings):
# Test embedding documents with unicode characters
codeflash_output = mock_embeddings.embed_documents(["こんにちは", "你好", "안녕하세요"])
# Outputs were verified to be equal to the original implementation
def test_large_number_of_documents(mock_embeddings):
# Test embedding a large number of documents
texts = ["Doc" + str(i) for i in range(10000)]
codeflash_output = mock_embeddings.embed_documents(texts)
# Outputs were verified to be equal to the original implementation
def test_large_documents(mock_embeddings):
# Test embedding large documents
texts = ["a" * 10000 for _ in range(100)]
codeflash_output = mock_embeddings.embed_documents(texts)
# Outputs were verified to be equal to the original implementation
def test_non_string_elements(mock_embeddings):
# Test embedding with non-string elements in the list
with pytest.raises(TypeError):
mock_embeddings.embed_documents([123, None, ["nested", "list"], {"key": "value"}])
# Outputs were verified to be equal to the original implementation
def test_state_mutation(mock_embeddings):
# Test that the state is correctly mutated
texts = ["Hello world", "Test document"]
mock_embeddings.embed_documents(texts)
# Outputs were verified to be equal to the original implementation
def test_mixed_type_elements(mock_embeddings):
# Test embedding with mixed type elements in the list
with pytest.raises(TypeError):
mock_embeddings.embed_documents(["Valid string", 123, None, "Another string"])
# Outputs were verified to be equal to the original implementation
def test_control_characters(mock_embeddings):
# Test embedding documents with control characters
codeflash_output = mock_embeddings.embed_documents(["\x00\x01\x02", "\x7F\x80\x81"])
# Outputs were verified to be equal to the original implementation
def test_combining_characters(mock_embeddings):
# Test embedding documents with combining characters
codeflash_output = mock_embeddings.embed_documents(["e\u0301", "a\u0302"])
# Outputs were verified to be equal to the original implementation
def test_escape_sequences(mock_embeddings):
# Test embedding documents with escape sequences
codeflash_output = mock_embeddings.embed_documents(["Line1\nLine2", "Tab\tSeparated"])
# Outputs were verified to be equal to the original implementation
def test_backslashes(mock_embeddings):
# Test embedding documents with backslashes
codeflash_output = mock_embeddings.embed_documents(["Backslash\\Test", "Double\\\\Backslash"])
# Outputs were verified to be equal to the original implementation
def test_empty_string(mock_embeddings):
# Test embedding an empty string
codeflash_output = mock_embeddings.embed_documents([""])
# Outputs were verified to be equal to the original implementation
def test_single_character(mock_embeddings):
# Test embedding single character documents
codeflash_output = mock_embeddings.embed_documents(["a", "b", "c"])
# Outputs were verified to be equal to the original implementation
def test_extremely_long_string(mock_embeddings):
# Test embedding an extremely long string
codeflash_output = mock_embeddings.embed_documents(["a" * 1000000])
# Outputs were verified to be equal to the original implementation
def test_utf16_encoded_string(mock_embeddings):
# Test embedding a UTF-16 encoded string
utf16_string = b'\xff\xfeH\x00e\x00l\x00l\x00o\x00'.decode('utf-16')
codeflash_output = mock_embeddings.embed_documents([utf16_string])
# Outputs were verified to be equal to the original implementation
def test_utf32_encoded_string(mock_embeddings):
# Test embedding a UTF-32 encoded string
utf32_string = b'\xff\xfe\x00\x00H\x00\x00\x00e\x00\x00\x00l\x00\x00\x00l\x00\x00\x00o\x00\x00\x00'.decode('utf-32')
codeflash_output = mock_embeddings.embed_documents([utf32_string])
# Outputs were verified to be equal to the original implementation
def test_surrogate_pairs(mock_embeddings):
# Test embedding documents with surrogate pairs
codeflash_output = mock_embeddings.embed_documents(["\uD83D\uDE00", "\uD83D\uDC36"])
# Outputs were verified to be equal to the original implementation
def test_non_printable_ascii(mock_embeddings):
# Test embedding documents with non-printable ASCII characters
codeflash_output = mock_embeddings.embed_documents(["\x01\x02\x03", "\x0E\x0F\x10"])
# Outputs were verified to be equal to the original implementation
def test_non_printable_unicode(mock_embeddings):
# Test embedding documents with non-printable Unicode characters
codeflash_output = mock_embeddings.embed_documents(["\u200B\u200C\u200D", "\uFEFF"])
# Outputs were verified to be equal to the original implementation
🔘 (none found) − ⏪ Replay Tests
Pull Request Validation Report
This comment is automatically generated by Conventional PR
Whitelist Report
| Whitelist | Active | Result |
|---|---|---|
| Pull request is submitted by a bot and should be ignored | ✅ | ✅ |
| Pull request is a draft and should be ignored | ✅ | ❌ |
| Pull request is made by a whitelisted user and should be ignored | ❌ | ❌ |
| Pull request is submitted by administrators and should be ignored | ❌ | ❌ |
Result
Pull request matches with one (or more) enabled whitelist criteria. Pull request validation is skipped.
Last Modified at 02 Aug 24 18:11 UTC
This pull request is automatically being deployed by Amplify Hosting (learn more).
Access this pull request here: https://pr-3176.dmtpw4p5recq1.amplifyapp.com
This PR has been automatically closed because the original PR #3216 by EvgenyK1 was closed.