mem0 icon indicating copy to clipboard operation
mem0 copied to clipboard

Fix Cascading Bugs in Huggingface and Ollama Embedder

Open kmitul opened this issue 1 year ago • 1 comments

Description

Note

This pull request builds upon the previous PR #1548 , which addressed Bug 1. During further testing, additional bugs were discovered and have been documented and resolved in this PR.

Summary

This pull request addresses a series of bugs in the embeddings especially while using 'huggingface' as embedder. Each fix revealed the next issue, which has been documented and resolved in sequence. I already made PR for Bug1 beforehand itself but discovered series of bugs later on as mentioned below.

Details

  1. Bug 1: ModuleNotFoundError: No module named 'embedding'

    • Files: mem0/embeddings/huggingface.py and mem0/embeddings/ollama.py
    • Description: Discoverd the incorrect import of the EmbeddingBase from non-existing directory.
    • Fix: Fixed the import of the EmbeddingBase class with correct parent directory.
  2. Bug 2: Value error, Unsupported embedding provider: huggingface

    • File: mem0/embeddings/configs.py
    • Description: After fixing the Bug 1, config was throwing error for hugginggface as provider in embedder config. 'huggingface' was not included in the validator in EmbedderConfig.
    • Fix: Added huggingface to the provider checklist in EmbedderConfig.
  3. Bug 3: TypeError: Can't instantiate abstract class HuggingFaceEmbedding with abstract method embed

    • File: mem0/embeddings/huggingface.py
    • Description: After fixing the Bug 2, HuggingFaceEmbedding was not being instantiated. The abstract method embed was missing.
    • Fix: Renamed the get_embedding method to embed in HuggingFaceEmbedding
  4. Bug 4: AttributeError: 'HuggingFaceEmbedding' object has no attribute 'dims'

    • File: mem0/embeddings/huggingface.py
    • Description: After fixing the Bug 3, HuggingFaceEmbedding was not being instantiated. The attribute dims was missing.
    • Fix: Added dims attribute to HuggingFaceEmbedding using get_sentence_embedding_dimension method of SentenceTransformer

Impact

These fixes, by addressing multiple cascading issues, improve the reliability and performance of mem0 while using the custom embedder config (like huggingface, ollama), preventing crashes and unexpected results.

Type of change

  • [x] Bug fix (non-breaking change which fixes an issue)

How Has This Been Tested?

Please describe the tests that you ran to verify your changes. Provide instructions so we can reproduce. Please also list any relevant details for your test configuration

Please delete options that are not relevant.

  • [x] Test Script (please provide)

Test Script

This code initializes the Memory class with the specified configuration, ensuring that the embedding provider 'huggingface' and the model 'multi-qa-MiniLM-L6-cos-v1' are correctly set up and functional.

from mem0 import Memory

config = {
    "embedder": {
        "provider": "huggingface",
        "config": {
            "model": "multi-qa-MiniLM-L6-cos-v1"
        }
    }
}

m = Memory.from_config(config)

Steps to Reproduce

  1. Bug 1: Try using "huggingface" or "ollama" to the embedding provider in config.
  2. Bug 2: After fixing Bug 1, try to instantiate Memory class using from_config method.
  3. Bug 3: After fixing Bug 2, try to instantiate Memory class using from_config method.
  4. Bug 4: After fixing Bug 3, try to instantiate Memory class using from_config method.

Checklist:

  • [x] My code follows the style guidelines of this project
  • [x] I have performed a self-review of my own code
  • [x] I have commented my code, particularly in hard-to-understand areas
  • [x] I have made corresponding changes to the documentation
  • [x] My changes generate no new warnings
  • [ ] I have added tests that prove my fix is effective or that my feature works
  • [x] New and existing unit tests pass locally with my changes
  • [x] Any dependent changes have been merged and published in downstream modules
  • [x] I have checked my code and corrected any misspellings

Maintainer Checklist

  • [ ] closes #xxxx (Replace xxxx with the GitHub issue number)
  • [ ] Made sure Checks passed

kmitul avatar Jul 24 '24 08:07 kmitul

CLA assistant check
All committers have signed the CLA.

CLAassistant avatar Jul 26 '24 02:07 CLAassistant

Closing this PR as changes have already been applied.

Dev-Khant avatar Oct 08 '24 06:10 Dev-Khant