Hima Patel

Results 40 comments of Hima Patel
trafficstars

Thank you for your response. Is this done for all languages in the data?

@guoday Do you then do [repo level dedup](https://github.com/deepseek-ai/DeepSeek-Coder/issues/42#issuecomment-1825826360) for all programming languages or just the above languages?

@guoday Thank you for your prompt responses. I was curious if you did any ablation studies/evaluations to understand if repo level concatenation helped the model performance in a significant way.

Do you have your own repo level benchmark or use a standard one?

Ok thanks, was aware of those. Once again, appreciate your prompt responses. I look forward to reading the technical report from your group. Thanks!

@guoday I was also wondering what do you do to the other files, like build files or metadata files? Thanks

@guoday Thanks for your response. So if I understand right, you employ fuzzy dedup at repo level. Is that correct?

@guoday Can you share some details on the model architecture that you used for this work?