mlx-swift-examples icon indicating copy to clipboard operation
mlx-swift-examples copied to clipboard

Halved bge-large model gives NaN embeddings when used from Swift

Open jrturton opened this issue 4 months ago • 0 comments

Creating a halved version of bge-large using the following python code:

hf_model = AutoModel.from_pretrained("BAAI/bge-large-en-v1.5")
hf_model.half()
hf_model.save_pretrained(model_path)

tokenizer = AutoTokenizer.from_pretrained("BAAI/bge-large-en-v1.5")
tokenizer.save_pretrained(tokenizer_path)

Seems to work just fine. However, loading this model using MLXEmbedders:

let config = MLXEmbedders.ModelConfiguration(directory: url)
let model = try await MLXEmbedders.loadModelContainer(configuration: config)

Will produce NaN embeddings for certain texts, when used as follows:

let embedding = await model.perform { (model: EmbeddingModel, tokenizer, _) -> [Double] in
            let inputTokens = tokenizer.encode(text: source, addSpecialTokens: true)
            let padded = MLXArray(inputTokens)
            let mask = MLXArray.ones(like: padded).asType(.bool)
            let tokenTypes = MLXArray.zeros(like: padded)
            let modelOutput = model(padded.expandedDimensions(axis: 0), positionIds: nil, tokenTypeIds: tokenTypes.expandedDimensions(axis: 0), attentionMask: mask.expandedDimensions(axis: 0))
            let pooler = Pooling(strategy: .first)
            let result = pooler(modelOutput, normalize: true, applyLayerNorm: false)
            result.eval()
            let squeezed = result.squeezed()
            return vDSP.floatToDouble(squeezed.asArray(Float.self))
        }

The same halved model files, using the same text, in Python, works fine:

model = EmbeddingModel(model_path=model_path,
                       pooling_strategy="first",
                       normalize=True,
                       max_length=512)
embs = model.encode(texts, show_progress=False)

Using the original model in Swift (via `MLXEmbedders.loadModelContainer(configuration: .bge_large) works fine.

The model files are too large to attach to an issue, see rdar://154959818 for details.

jrturton avatar Jul 14 '25 09:07 jrturton