Zhengyang Tang comments

Results 10 comments of


                                            Zhengyang Tang

[REQUEST] torch equivalent api model.no_sync()

@jeffra In contrastive learning, which is super popular in research community currently, we usually rely on large batch size to compute the NCE softmax loss. Though the remarkable GPU RAM...

[REQUEST] torch equivalent api model.no_sync()

I'm sure that gradient_accumulation cannot support my purpose, because gradient_accumulation only accumulates the gradients at the training example level. For contrastive NCE softmax loss, I have to break the inputs...

[REQUEST] torch equivalent api model.no_sync()

@jeffra Wonder if you will have a plan to add this feature? If so, does there exist a time expectation?

[REQUEST] torch equivalent api model.no_sync()

@gzerveas Yes, thanks for clarification! @jeffra Deepspeed is critical for us to employ billion-scale models in contrastive learning. Looking forward to your thoughts:)

Multi-GPU Support

@benfred hi, I also have a strong demand for multi-gpu support. 1. OOM is my motivation to use multi-gpu like index_cpu_to_all_gpus() in faiss. 2. I'm using ALS right now. 3....

78k evolved code instructions

Same question. Do you evolve the rest instruction data from openai GPT4 or starcoder itself?

SampledSoftmax Loss in Retrieval

@maciejkula Thanks for your patient answer. I've seen many papers and experiments state that in-batch softmax often achieves better performance over others. Some combinations like in-batch softmax + one global...

SampledSoftmax Loss in Retrieval

@maciejkula I find your answer really interesting when taking a close look to `Retrieval` source code. I notice that you've already taken the sampling-bias-corrected paper in account. Especially for the...

SampledSoftmax Loss in Retrieval

@maciejkula @biteorange hi, I'm a bit lost in how to pass the data in `Retrieval` task. Say, my data is set up as follows: ```python user_embeddings = np.array([ [0.1, 0.2],...

How to pre-train some parameters of deep prompt？

Hi, for pre-training of deep prompts, we just employ a prompted dual-encoder to perform the RIP task, during which the PLM is fixed as a vanilla PLM while prompts is...