mem0 ⚡️ Speed up `BaseLlmConfig._validate_prompt_history()` by 98% in `embedchain/config/llm/base.py`

📄 `BaseLlmConfig._validate_prompt_history()` in `embedchain/config/llm/base.py`

📈 Performance improved by 98% (0.98x faster)

⏱️ Runtime went down from 33.3 microseconds to 16.8 microseconds

Description

Changes Made.

Use the complied regex directly: Using the compiled regex directly is faster than calling re.search over the compiled pattern.

Fixes # (issue)

Type of change

Please delete options that are not relevant.

[ ] Bug fix (non-breaking change which fixes an issue)
[ ] New feature (non-breaking change which adds functionality)
[ ] Breaking change (fix or feature that would cause existing functionality to not work as expected)
[X] Refactor (does not change functionality, e.g. code style improvements, linting)
[ ] Documentation update

How Has This Been Tested?

[X] Test Script (please provide) The new optimized code was tested for correctness. The results are listed below.

🔘 (none found) − ⚙️ Existing Unit Tests

✅ 5 Passed − 🌀 Generated Regression Tests

(click to show generated tests)

# imports
import re
from string import Template
from typing import Optional

import pytest  # used for our unit tests

# function to test
history_re = re.compile(r"\$\{*history\}*")
from embedchain.config.llm.base import BaseLlmConfig


# unit tests
class TestValidatePromptHistory:
    # Basic functionality tests
    def test_valid_prompt_with_history(self):
        prompt = Template("${history}")
        assert BaseLlmConfig._validate_prompt_history(prompt) is not None

    def test_valid_prompt_with_history_in_text(self):
        prompt = Template("This is a test prompt with ${history} included.")
        assert BaseLlmConfig._validate_prompt_history(prompt) is not None

    def test_valid_prompt_with_history_at_end(self):
        prompt = Template("Start: ${history} End.")
        assert BaseLlmConfig._validate_prompt_history(prompt) is not None

    def test_invalid_prompt_without_history(self):
        prompt = Template("This is a test prompt without history.")
        assert BaseLlmConfig._validate_prompt_history(prompt) is None

    def test_invalid_prompt_with_different_placeholder(self):
        prompt = Template("Start: ${context} End.")
        assert BaseLlmConfig._validate_prompt_history(prompt) is None

    def test_invalid_prompt_with_random_text(self):
        prompt = Template("Just some random text.")
        assert BaseLlmConfig._validate_prompt_history(prompt) is None

    # Edge cases
    def test_empty_prompt(self):
        prompt = Template("")
        assert BaseLlmConfig._validate_prompt_history(prompt) is None

    def test_prompt_with_similar_but_incorrect_patterns(self):
        prompt = Template("${histories}")
        assert BaseLlmConfig._validate_prompt_history(prompt) is None

        prompt = Template("${historyy}")
        assert BaseLlmConfig._validate_prompt_history(prompt) is None

        prompt = Template("${history context}")
        assert BaseLlmConfig._validate_prompt_history(prompt) is None

    def test_prompt_with_multiple_occurrences_of_history(self):
        prompt = Template("${history} and ${history}")
        assert BaseLlmConfig._validate_prompt_history(prompt) is not None

        prompt = Template("First occurrence: ${history}, second occurrence: ${history}")
        assert BaseLlmConfig._validate_prompt_history(prompt) is not None

    # Special characters and escaping
    def test_prompt_with_special_characters(self):
        prompt = Template("This is a test prompt with special characters: ${history}!@#$%^&*()")
        assert BaseLlmConfig._validate_prompt_history(prompt) is not None

    def test_prompt_with_escaped_sequence(self):
        prompt = Template("Escaped sequence: \\${history}")
        assert BaseLlmConfig._validate_prompt_history(prompt) is None

    def test_prompt_with_mixed_special_characters_and_history(self):
        prompt = Template("Mixed special characters and history: ${history} \\${context}")
        assert BaseLlmConfig._validate_prompt_history(prompt) is not None

    # Large scale test cases
    def test_large_prompt_with_multiple_placeholders(self):
        large_prompt = "${history} " * 1000 + "${context} " * 1000 + "${query} " * 1000
        prompt = Template(large_prompt)
        assert BaseLlmConfig._validate_prompt_history(prompt) is not None

    # Template edge cases
    def test_prompt_with_only_placeholders(self):
        prompt = Template("${history}${context}${query}")
        assert BaseLlmConfig._validate_prompt_history(prompt) is not None

    def test_prompt_with_placeholders_in_various_positions(self):
        prompt = Template("Start: ${history}")
        assert BaseLlmConfig._validate_prompt_history(prompt) is not None

        prompt = Template("Middle: ${context} ${history} ${query}")
        assert BaseLlmConfig._validate_prompt_history(prompt) is not None

        prompt = Template("End: ${history}")
        assert BaseLlmConfig._validate_prompt_history(prompt) is not None

    # Unicode and non-ASCII characters
    def test_prompt_with_unicode_characters(self):
        prompt = Template("Prompt with Unicode: ${history} 😊")
        assert BaseLlmConfig._validate_prompt_history(prompt) is not None

    def test_prompt_with_non_ascii_characters(self):
        prompt = Template("Non-ASCII characters: ${history} ñ é ü")
        assert BaseLlmConfig._validate_prompt_history(prompt) is not None

    # Mixed valid and invalid patterns
    def test_prompt_with_mixed_valid_and_invalid_patterns(self):
        prompt = Template("Valid and invalid: ${history} ${histories}")
        assert BaseLlmConfig._validate_prompt_history(prompt) is not None

        prompt = Template("Another mix: ${context} ${history} ${historyy}")
        assert BaseLlmConfig._validate_prompt_history(prompt) is not None

Checklist:

[X] My code follows the style guidelines of this project
[X] I have performed a self-review of my own code
[X] I have commented my code, particularly in hard-to-understand areas
[X] I have made corresponding changes to the documentation
[X] My changes generate no new warnings
[X] I have added tests that prove my fix is effective or that my feature works
[X] New and existing unit tests pass locally with my changes
[X] Any dependent changes have been merged and published in downstream modules
[X] I have checked my code and corrected any misspellings

Maintainer Checklist

[ ] closes #xxxx (Replace xxxx with the GitHub issue number)
[ ] Made sure Checks passed

Jun 11 '24 22:06 misrasaurabh1

Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
_{You have signed the CLA already but the status is still pending? Let us recheck it.}

Jul 26 '24 02:07 CLAassistant

@misrasaurabh1 Please resolve the merge conflicts.

Aug 01 '24 20:08 Dev-Khant

Hey @misrasaurabh1 thanks for your contribution. Closing this PR for now as there is no publicly verifiable data about the claims made.

Aug 03 '24 05:08 Dev-Khant

mem0 mem0 copied to clipboard

⚡️ Speed up `BaseLlmConfig._validate_prompt_history()` by 98% in `embedchain/config/llm/base.py`

📄 BaseLlmConfig._validate_prompt_history() in embedchain/config/llm/base.py

Description

Changes Made.

Type of change

How Has This Been Tested?

🔘 (none found) − ⚙️ Existing Unit Tests

✅ 5 Passed − 🌀 Generated Regression Tests

Checklist:

Maintainer Checklist

mem0
mem0 copied to clipboard

📄 `BaseLlmConfig._validate_prompt_history()` in `embedchain/config/llm/base.py`