XLCoST
XLCoST copied to clipboard
Code and data for XLCoST: A Benchmark Dataset for Cross-lingual Code Intelligence
Hello @aneeshjain, could you provide a link to download the raw dataset, before preprocessing with CodeGen? I read https://github.com/reddy-lab-code-research/XLCoST/issues/7 but some languages (PHP, C#) don't have a language_processor so cannot...
I saw python code data from nl2codesearch directory in the dataset. However, I ran 9283 Python code data in the train.jsonl file one by one, and there were about 4,000...
It looks like when you try to run the code/translation as given in the Readme https://github.com/reddy-lab-code-research/XLCoST/tree/main/code/translation/Readme.md there are path issues, which makes the script fail? Can we relook at the...
Hi, Some data have missing logical operators in boolean expressions, causing errors in the code data. For instance: ```{c++} // generation/pair_data_tok_full/C++-Python/train-C++-Python-tok.cpp, line 42, missing "||" if ( ! ( n...
I want to use this dataset, how do I download it.
Hello! 👋 I'm working on a code translation model using your amazing XLCoST dataset. I noticed that the Google Drive link provided in the README for downloading XLCoST_data.zip is currently...