Zhengyang Tang issues

Results 9 issues of


                                            Zhengyang Tang

Performance deteriorates when reproducing

Hi, thanks for your great work. I'm trying to reproduce your model, but somehow my model performance always deteriorates around 1 point of HR@10. The code is actually not that...

Multi-GPU Support

Do you support training models among multiple gpus? I'm asking this because my sparse matrix is quite large like: - shape: (80M, 60M) - nnz: 400M

[REQUEST] torch equivalent api model.no_sync()

Hi, I'm adding deepspeed for my distributed model training framework. When using pytorch native apis, everything is fine. For distributed training, originally I would wrapp model in an object of...

enhancement

what symbolic engine is AlphaGeometry using?

SampledSoftmax Loss in Retrieval

Hi, as is shown in the basic_retrieval tutorial, we seem to use `tf.keras.losses.CategoricalCrossentropy` loss as default. 1. I wonder if there is any difference between that and `tf.nn.sampled_softmax_loss`? In my...

question

consider releasing output data?

Hi, since `code-davinci-002` is deprecated by openai, I wonder will you consider release your output data run by your `codex_gsm8k_complex.ipynb` script, whose outputs on GSM8K test set are definitely important...

Discrepancy in Model Performance When Reproducing Experiment

![image](https://github.com/user-attachments/assets/50c4684d-b052-493a-9995-584b07c52b79) Hi, I've been attempting to reproduce an experiment involving the finetuning of the Llama-2-7b-hf model, specifically using a random 5% of training data, using open-instruct [finetune_with_accelerate.sh](https://github.com/allenai/open-instruct/blob/main/scripts/finetune_with_accelerate.sh). I adhered to...

Have you thought about generating the SFT data for Deepseek-Coder-V2?

Given its exceptional capabilities in coding and mathematics, the accuracy of both can be automatically verified by the final results. It would be quite persuasive if your method could match...

Guidance on Integrating New Benchmarks with Chain-of-experts

Hello, I am currently assessing the performance of Chain-of-experts using new benchmarks, specifically [MAMO](https://huggingface.co/datasets/CardinalOperations/MAMO) and [IndustryOR](https://huggingface.co/datasets/CardinalOperations/IndustryOR). I observed that for existing benchmarks like NL4OPT and ComplexOR, you have implemented few-shot...