CodeGen icon indicating copy to clipboard operation
CodeGen copied to clipboard

Reference implementation of code generation projects from Facebook AI Research. General toolkit to apply machine learning to code, from dataset creation to model training and evaluation. Comes with pr...

Results 38 CodeGen issues
Sort by recently updated
recently updated
newest added

I have a few questions. 1. I have used the data in the folder [https://github.com/facebookresearch/CodeGen/tree/main/data/test_dataset](https://github.com/facebookresearch/CodeGen/tree/main/data/test_dataset) learned BPE codes and vocab using Monolingual Functions mode. I want to know how to...

question

We have pre-processed our data in csharp and ruby, and obtained their monolingual data and after that we are trying to run the MLM step by reloading TransCoder_model_1, but in...

Hi, I am trying to write a language processor for C#. I can't seem to find any documentation or comments that explain the logic behind the `extract_function` method that I...

I am wondering if I can add new lanuages for code translation, for example I want to translate COBOL code to python Do you have any tips if you can...

question

Can you please share the untokenized version of [transcoder_test_set.zip](https://dl.fbaipublicfiles.com/transcoder/test_set/transcoder_test_set.zip)? I would like to tokenize it in a different way. Thank you in advance!

The command is `!python -m codegen_sources.preprocessing.preprocess data/test_dataset/ --langs cpp java python --mode=monolingual --local=True --fastbpe_vocab_path=/content/CodeGen/data/bpe/cpp-java-python/vocab --fastbpe_code_path=/content/CodeGen/data/bpe/cpp-java-python/codes --bpe_mode=fast --train_splits=1 --percent_test_valid=10` When you train Transcoder from your previous checkpoint you got such lines:...

I against assertion `AssertionError: failed to learn bpe on /media/Z/dungnm31/transcoder/cpp-java-python.monolingual.tok.shuf.50gb, command: /home/dungnm/CodeGen/fastBPE/fast learnbpe 50000 /media/Z/dungnm31/transcoder/cpp-java-python.monolingual.tok.shuf.50gb > /media/Z/dungnm31/transcoder/cpp-java-python.monolingual.codes` It turn out the command itself was not right. The `fastBPE` path will...

I am trying to create the self-training dataset, as per the instructions at https://github.com/facebookresearch/CodeGen/blob/main/docs/TransCoder-ST.md. From Google BigQuery, I got 500 `.json.gz` files. Thereafter I preprocessed them and got the following...

It's too hard for me to translate my cpp code to python. I will appreciate it if you would build a website to provide the API to translate the code....

At line no. 1483 in the file codegen_sources/model/src/trainer.py. the code is `self.n_sentences += params.batch_size` I think it should be `self.n_sentences += len1.size(0)` https://github.com/facebookresearch/CodeGen/blob/6e93aca63e7bc77287c9965a5080456326651237/codegen_sources/model/src/trainer.py#L1483 With above bug notion of one epoch...