CodeTF icon indicating copy to clipboard operation
CodeTF copied to clipboard

Unable to run example humaneval code

Open yaoyanglee opened this issue 1 year ago • 4 comments

`!pip install sentencepiece from codetf.models import load_model_pipeline from codetf.data_utility.human_eval_dataset import HumanEvalDataset from codetf.performance.model_evaluator import ModelEvaluator import os

os.environ["HF_ALLOW_CODE_EVAL"] = "1" os.environ["TOKENIZERS_PARALLELISM"] = "true"

model_class = load_model_pipeline(model_name="causallm", task="pretrained", model_type="codegen-350M-mono", is_eval=True, load_in_8bit=True, weight_sharding=False)

dataset = HumanEvalDataset(tokenizer=model_class.get_tokenizer()) prompt_token_ids, prompt_attention_masks, references = dataset.load()

problems = TensorDataset(prompt_token_ids, prompt_attention_masks)

evaluator = ModelEvaluator(model_class) avg_pass_at_k = evaluator.evaluate_pass_k(problems=problems, unit_tests=references) print("Pass@k: ", avg_pass_at_k)`

Above is the code that was used. During execution in Google Colab, I received the error, in <cell line: 15>:15 │ │ │ │ /usr/local/lib/python3.10/dist-packages/codetf/data_utility/human_eval_dataset.py:29 in load │ │ │ │ 26 │ │ │ unit_test = re.sub(r'METADATA = {[^}]*}', '', unit_test, flags=re.MULTILINE) │ │ 27 │ │ │ references.append(unit_test) │ │ 28 │ │ │ │ ❱ 29 │ │ prompt_token_ids, prompt_attention_masks = self.process_data(prompts, use_max_le │ │ 30 │ │ │ │ 31 │ │ return prompt_token_ids, prompt_attention_masks, references │ │ 32 │ ╰──────────────────────────────────────────────────────────────────────────────────────────────────╯ TypeError: BaseDataset.process_data() got an unexpected keyword argument 'use_max_length'

After looking through the source code I don't seem to see this keyword argument, apart from max_length. Would anyone mind shedding some light on the issue?

yaoyanglee avatar Jun 12 '23 02:06 yaoyanglee