FlagEmbedding icon indicating copy to clipboard operation
FlagEmbedding copied to clipboard

the reason using type 'str' of M3 lexical_weights 's key

Open wxywb opened this issue 1 year ago • 1 comments

I noticed lexicial_weights of BGE M3, is a dict of str: float. The key is a string form of index(tokenizer's vocabulary), is there some design reason for this? We can see when convert this dict into tokens: float form, we need to convert this string form index into int index.

https://github.com/FlagOpen/FlagEmbedding/blob/95ab52eb9fd55cfe04b625a4911c63c81f3570b5/FlagEmbedding/bge_m3.py#L75

wxywb avatar Mar 07 '24 02:03 wxywb

Thanks for your question. There is no special reason for this.

hanhainebula avatar Mar 07 '24 12:03 hanhainebula