Liang Wang
Liang Wang
Acc@k and Hit@k are the same metrics with different names. For both metrics, a prediction is considered correct if it is in the top-k ranked list. The main difference in...
Though we only experiment with English datasets, the method is independent of language. To support Chinese, you need: 1. A knowledge graph completion dataset in Chinese. 2. Replace the BERT...
I use the following code snippet, please change accordingly based on your data format. ```python def plot_entity_embeddings(path='./vectors.json'): import json import pandas as pd import seaborn as sns import matplotlib.pyplot as...
Hi @quovadisss , The batch size information is available in Table 11 of Appendix. For pre-training, the batch size is 32k (aggregated across all GPUs). The per device batch size...
The in-batch negatives are all other documents in the same batch (m is `32k - 1`).
@Iambestfeed Why would you want to include `lm_head` for quantization? Embedding models do not need it anyway.
What about minimizing the MSE loss between the embedding vectors before and after quantization? Using `lm_head` makes little sense for embeddings, they are not fine-tuned with other parameters.
If you want to optimize GPU memory usage and speed up inference, that surely makes sense.
This is a known and expected behavior. For tasks like text retrieval or semantic similarity, what matters is the relative order of the scores instead of the absolute values, so...
> @intfloat I haven't looked up what whitening means, but would simply rescaling the score not be adequate? That is, given a score X in `[0.7, 1.0]`, take as score...