FlagEmbedding
FlagEmbedding copied to clipboard
the reason using type 'str' of M3 lexical_weights 's key
I noticed lexicial_weights of BGE M3, is a dict of str: float. The key is a string form of index(tokenizer's vocabulary), is there some design reason for this? We can see when convert this dict into tokens: float form, we need to convert this string form index into int index.
https://github.com/FlagOpen/FlagEmbedding/blob/95ab52eb9fd55cfe04b625a4911c63c81f3570b5/FlagEmbedding/bge_m3.py#L75
Thanks for your question. There is no special reason for this.