dspy icon indicating copy to clipboard operation
dspy copied to clipboard

even if i am handling this case by using format attribute of dspy.OutputFeild() but still getting error.

Open anushka192001 opened this issue 4 months ago • 6 comments

12 frames in (x) 3 4 speech = dspy.InputField() ----> 5 extracted_tokens = dspy.OutputField(desc="extract words from text",format=lambda x: ' '.join(x) if isinstance(x, list) else str(x))

TypeError: sequence item 0: expected str instance, list found

anushka192001 avatar Apr 20 '24 09:04 anushka192001

Hm that’s strange. Can you share a small example to replicate the error

okhat avatar Apr 20 '24 16:04 okhat

@okhat leave i tackled it anyhow but now i am getting this new error "index out of range in self" when i use more than 14 examples to train but when i decrease no. of examples then it works all fine.

i am using gpt2 from hugging face and small snipet of code is ....

#teleprompeter to optimize the weights of this program using a metric defined validate_Pred_Imp_words from dspy.teleprompt import BootstrapFewShot,Ensemble

Define the validation logic for the hate speech classification task

def validate_hate_speech(example, pred, trace=None): for i in range(3): if i+1<=len(example.extracted_tokens): if not find_similarity(example.extracted_tokens[i],pred.pred_extracted_tokens[i]): return False if i+1<=len(example.label): if str(example.label[i]) != str(pred.pred_label[i]): return False if i+1<=len(example.target): if not find_similarity(example.target[i],pred.pred_target[i]): return False return True

Set up the teleprompter

teleprompter = BootstrapFewShot(metric=validate_hate_speech)

Compile the program

compiled_program = teleprompter.compile(HateSpeechClassifier(), trainset=extract_words_dataset_train[:14])

anushka192001 avatar Apr 21 '24 07:04 anushka192001

Hi @anushka192001 , could you share the full error trace for this program? It's unclear what "index out of range in self" refers to

arnavsinghvi11 avatar Apr 27 '24 23:04 arnavsinghvi11

@arnavsinghvi11 can you please share your mail id i will share the code on which i am getting this error.

anushka192001 avatar Apr 28 '24 10:04 anushka192001

@arnavsinghvi11 THE ERROR-----

0%| | 0/100 [00:00<?, ?it/s]Token indices sequence length is longer than the specified maximum sequence length for this model (1037 > 1024). Running this sequence through the model will result in indexing errors Setting pad_token_id to eos_token_id:50256 for open-end generation. Setting pad_token_id to eos_token_id:50256 for open-end generation. Setting pad_token_id to eos_token_id:50256 for open-end generation. Failed to run or to evaluate example Example({'speech': #REDACTED (input_keys={'speech'}) with <function validate_hate_speech at 0x7eeb746a1090> due to index out of range in self. Failed to run or to evaluate example Example({'speech': #REDACTED(input_keys={'speech'}) with <function validate_hate_speech at 0x7eeb746a1090> due to index out of range in self.

3%|▎ | 3/100 [00:07<04:04, 2.53s/it]Setting pad_token_id to eos_token_id:50256 for open-end generation. Failed to run or to evaluate example Example({'speech': #REDACTED (input_keys={'speech'}) with <function validate_hate_speech at 0x7eeb746a1090> due to index out of range in self.

4%|▍ | 4/100 [00:16<07:29, 4.68s/it]Setting pad_token_id to eos_token_id:50256 for open-end generation. 4%|▍ | 4/100 [00:16<06:43, 4.21s/it]Failed to run or to evaluate example Example({'speech'#REDACTED(input_keys={'speech'}) with <function validate_hate_speech at 0x7eeb746a1090> due to index out of range in self.

IndexError Traceback (most recent call last) in <cell line: 37>() 35 36 # Compile the program ---> 37 compiled_program = teleprompter.compile(HateSpeechClassifier(), trainset=extract_words_dataset_train[:100])

25 frames

/usr/local/lib/python3.10/dist-packages/torch/nn/functional.py in embedding(input, weight, padding_idx, max_norm, norm_type, scale_grad_by_freq, sparse) 2235 # remove once script supports set_grad_enabled 2236 no_grad_embedding_renorm(weight, input, max_norm, norm_type) -> 2237 return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse) 2238 2239

IndexError: index out of range in self

anushka192001 avatar Apr 28 '24 10:04 anushka192001

Hi @anushka192001 ,

please refrain from pasting code that has harmful content going forward. I have corrected this on various raised issues by you. the issue is still unclear as the stack trace you've pasted does not highlight the impacted DSPy components. I see that 25 frames have not been shown. can you please only post the ones related to DSPy?

Closing duplicate issues related to this.

arnavsinghvi11 avatar Apr 28 '24 18:04 arnavsinghvi11