mem0 icon indicating copy to clipboard operation
mem0 copied to clipboard

⚡️ Speed up `BaseLlmConfig._validate_prompt_history()` by 98% in `embedchain/config/llm/base.py`

Open misrasaurabh1 opened this issue 1 year ago • 1 comments

📄 BaseLlmConfig._validate_prompt_history() in embedchain/config/llm/base.py

📈 Performance improved by 98% (0.98x faster)

⏱️ Runtime went down from 33.3 microseconds to 16.8 microseconds

Description

Changes Made.

  1. Use the complied regex directly: Using the compiled regex directly is faster than calling re.search over the compiled pattern.

Fixes # (issue)

Type of change

Please delete options that are not relevant.

  • [ ] Bug fix (non-breaking change which fixes an issue)
  • [ ] New feature (non-breaking change which adds functionality)
  • [ ] Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • [X] Refactor (does not change functionality, e.g. code style improvements, linting)
  • [ ] Documentation update

How Has This Been Tested?

  • [X] Test Script (please provide) The new optimized code was tested for correctness. The results are listed below.

🔘 (none found) − ⚙️ Existing Unit Tests

✅ 5 Passed − 🌀 Generated Regression Tests

(click to show generated tests)
# imports
import re
from string import Template
from typing import Optional

import pytest  # used for our unit tests

# function to test
history_re = re.compile(r"\$\{*history\}*")
from embedchain.config.llm.base import BaseLlmConfig


# unit tests
class TestValidatePromptHistory:
    # Basic functionality tests
    def test_valid_prompt_with_history(self):
        prompt = Template("${history}")
        assert BaseLlmConfig._validate_prompt_history(prompt) is not None

    def test_valid_prompt_with_history_in_text(self):
        prompt = Template("This is a test prompt with ${history} included.")
        assert BaseLlmConfig._validate_prompt_history(prompt) is not None

    def test_valid_prompt_with_history_at_end(self):
        prompt = Template("Start: ${history} End.")
        assert BaseLlmConfig._validate_prompt_history(prompt) is not None

    def test_invalid_prompt_without_history(self):
        prompt = Template("This is a test prompt without history.")
        assert BaseLlmConfig._validate_prompt_history(prompt) is None

    def test_invalid_prompt_with_different_placeholder(self):
        prompt = Template("Start: ${context} End.")
        assert BaseLlmConfig._validate_prompt_history(prompt) is None

    def test_invalid_prompt_with_random_text(self):
        prompt = Template("Just some random text.")
        assert BaseLlmConfig._validate_prompt_history(prompt) is None

    # Edge cases
    def test_empty_prompt(self):
        prompt = Template("")
        assert BaseLlmConfig._validate_prompt_history(prompt) is None

    def test_prompt_with_similar_but_incorrect_patterns(self):
        prompt = Template("${histories}")
        assert BaseLlmConfig._validate_prompt_history(prompt) is None

        prompt = Template("${historyy}")
        assert BaseLlmConfig._validate_prompt_history(prompt) is None

        prompt = Template("${history context}")
        assert BaseLlmConfig._validate_prompt_history(prompt) is None

    def test_prompt_with_multiple_occurrences_of_history(self):
        prompt = Template("${history} and ${history}")
        assert BaseLlmConfig._validate_prompt_history(prompt) is not None

        prompt = Template("First occurrence: ${history}, second occurrence: ${history}")
        assert BaseLlmConfig._validate_prompt_history(prompt) is not None

    # Special characters and escaping
    def test_prompt_with_special_characters(self):
        prompt = Template("This is a test prompt with special characters: ${history}!@#$%^&*()")
        assert BaseLlmConfig._validate_prompt_history(prompt) is not None

    def test_prompt_with_escaped_sequence(self):
        prompt = Template("Escaped sequence: \\${history}")
        assert BaseLlmConfig._validate_prompt_history(prompt) is None

    def test_prompt_with_mixed_special_characters_and_history(self):
        prompt = Template("Mixed special characters and history: ${history} \\${context}")
        assert BaseLlmConfig._validate_prompt_history(prompt) is not None

    # Large scale test cases
    def test_large_prompt_with_multiple_placeholders(self):
        large_prompt = "${history} " * 1000 + "${context} " * 1000 + "${query} " * 1000
        prompt = Template(large_prompt)
        assert BaseLlmConfig._validate_prompt_history(prompt) is not None

    # Template edge cases
    def test_prompt_with_only_placeholders(self):
        prompt = Template("${history}${context}${query}")
        assert BaseLlmConfig._validate_prompt_history(prompt) is not None

    def test_prompt_with_placeholders_in_various_positions(self):
        prompt = Template("Start: ${history}")
        assert BaseLlmConfig._validate_prompt_history(prompt) is not None

        prompt = Template("Middle: ${context} ${history} ${query}")
        assert BaseLlmConfig._validate_prompt_history(prompt) is not None

        prompt = Template("End: ${history}")
        assert BaseLlmConfig._validate_prompt_history(prompt) is not None

    # Unicode and non-ASCII characters
    def test_prompt_with_unicode_characters(self):
        prompt = Template("Prompt with Unicode: ${history} 😊")
        assert BaseLlmConfig._validate_prompt_history(prompt) is not None

    def test_prompt_with_non_ascii_characters(self):
        prompt = Template("Non-ASCII characters: ${history} ñ é ü")
        assert BaseLlmConfig._validate_prompt_history(prompt) is not None

    # Mixed valid and invalid patterns
    def test_prompt_with_mixed_valid_and_invalid_patterns(self):
        prompt = Template("Valid and invalid: ${history} ${histories}")
        assert BaseLlmConfig._validate_prompt_history(prompt) is not None

        prompt = Template("Another mix: ${context} ${history} ${historyy}")
        assert BaseLlmConfig._validate_prompt_history(prompt) is not None

Checklist:

  • [X] My code follows the style guidelines of this project
  • [X] I have performed a self-review of my own code
  • [X] I have commented my code, particularly in hard-to-understand areas
  • [X] I have made corresponding changes to the documentation
  • [X] My changes generate no new warnings
  • [X] I have added tests that prove my fix is effective or that my feature works
  • [X] New and existing unit tests pass locally with my changes
  • [X] Any dependent changes have been merged and published in downstream modules
  • [X] I have checked my code and corrected any misspellings

Maintainer Checklist

  • [ ] closes #xxxx (Replace xxxx with the GitHub issue number)
  • [ ] Made sure Checks passed

misrasaurabh1 avatar Jun 11 '24 22:06 misrasaurabh1

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.

CLAassistant avatar Jul 26 '24 02:07 CLAassistant

@misrasaurabh1 Please resolve the merge conflicts.

Dev-Khant avatar Aug 01 '24 20:08 Dev-Khant

Hey @misrasaurabh1 thanks for your contribution. Closing this PR for now as there is no publicly verifiable data about the claims made.

Dev-Khant avatar Aug 03 '24 05:08 Dev-Khant