Zhengyang Tang
Zhengyang Tang
Hi, thanks for your great work. I'm trying to reproduce your model, but somehow my model performance always deteriorates around 1 point of HR@10. The code is actually not that...
Do you support training models among multiple gpus? I'm asking this because my sparse matrix is quite large like: - shape: (80M, 60M) - nnz: 400M
Hi, I'm adding deepspeed for my distributed model training framework. When using pytorch native apis, everything is fine. For distributed training, originally I would wrapp model in an object of...
Hi, as is shown in the basic_retrieval tutorial, we seem to use `tf.keras.losses.CategoricalCrossentropy` loss as default. 1. I wonder if there is any difference between that and `tf.nn.sampled_softmax_loss`? In my...
Hi, since `code-davinci-002` is deprecated by openai, I wonder will you consider release your output data run by your `codex_gsm8k_complex.ipynb` script, whose outputs on GSM8K test set are definitely important...
 Hi, I've been attempting to reproduce an experiment involving the finetuning of the Llama-2-7b-hf model, specifically using a random 5% of training data, using open-instruct [finetune_with_accelerate.sh](https://github.com/allenai/open-instruct/blob/main/scripts/finetune_with_accelerate.sh). I adhered to...
Given its exceptional capabilities in coding and mathematics, the accuracy of both can be automatically verified by the final results. It would be quite persuasive if your method could match...
Hello, I am currently assessing the performance of Chain-of-experts using new benchmarks, specifically [MAMO](https://huggingface.co/datasets/CardinalOperations/MAMO) and [IndustryOR](https://huggingface.co/datasets/CardinalOperations/IndustryOR). I observed that for existing benchmarks like NL4OPT and ComplexOR, you have implemented few-shot...