Results 10 comments of Zhengyang Tang

@jeffra In contrastive learning, which is super popular in research community currently, we usually rely on large batch size to compute the NCE softmax loss. Though the remarkable GPU RAM...

I'm sure that gradient_accumulation cannot support my purpose, because gradient_accumulation only accumulates the gradients at the training example level. For contrastive NCE softmax loss, I have to break the inputs...

@jeffra Wonder if you will have a plan to add this feature? If so, does there exist a time expectation?

@gzerveas Yes, thanks for clarification! @jeffra Deepspeed is critical for us to employ billion-scale models in contrastive learning. Looking forward to your thoughts:)

@benfred hi, I also have a strong demand for multi-gpu support. 1. OOM is my motivation to use multi-gpu like index_cpu_to_all_gpus() in faiss. 2. I'm using ALS right now. 3....

Same question. Do you evolve the rest instruction data from openai GPT4 or starcoder itself?

@maciejkula Thanks for your patient answer. I've seen many papers and experiments state that in-batch softmax often achieves better performance over others. Some combinations like in-batch softmax + one global...

@maciejkula I find your answer really interesting when taking a close look to `Retrieval` source code. I notice that you've already taken the sampling-bias-corrected paper in account. Especially for the...

@maciejkula @biteorange hi, I'm a bit lost in how to pass the data in `Retrieval` task. Say, my data is set up as follows: ```python user_embeddings = np.array([ [0.1, 0.2],...

Hi, for pre-training of deep prompts, we just employ a prompted dual-encoder to perform the RIP task, during which the PLM is fixed as a vanilla PLM while prompts is...