semantic-kernel icon indicating copy to clipboard operation
semantic-kernel copied to clipboard

Possible Divide By Zero in volatile_memory_store.py

Open ghadlich opened this issue 2 years ago • 0 comments

https://github.com/microsoft/semantic-kernel/blob/8590ec5e446715e4ec8e0cd7aa59bb54e3f2d09a/python/semantic_kernel/memory/volatile_memory_store.py#L35-L38

Possible divide by zero if linalg.norm returns a 0.

Possible fix:

        # Calculate the L2 norm (Euclidean norm) of the query embedding
        query_norm = linalg.norm(embedding)
        
        # Calculate the L2 norms of each embedding in the collection
        collection_norms = linalg.norm(embedding_array, axis=1)
        
        # Identify valid indices where both the query norm and collection norms are non-zero
        # This step helps to avoid division by zero issues when calculating cosine similarity
        valid_indices = (query_norm != 0) & (collection_norms != 0)
        
        # Initialize an array to store similarity scores, setting them to 0.0 by default
        similarity_scores = array([0.0] * len(embedding_collection))

        # If there are any valid indices (i.e., both query and collection norms are non-zero),
        # calculate the cosine similarity between the query embedding and the collection embeddings
        if valid_indices.any():
            similarity_scores[valid_indices] = (
                # Calculate the dot product between the query embedding and valid collection embeddings
                embedding.dot(embedding_array[valid_indices].T)
                # Normalize the dot product by multiplying the query norm and valid collection norms
                / (query_norm * collection_norms[valid_indices])
            )[0]

ghadlich avatar Mar 21 '23 00:03 ghadlich