pytorch-transformers-classification icon indicating copy to clipboard operation
pytorch-transformers-classification copied to clipboard

can I have more column in train set

Open shainaraza opened this issue 5 years ago • 10 comments

other than the specified format as below, can I have more columns as features? guid: An ID for the row. label: The label for the row (should be an int). alpha: A column of the same letter for all rows. Not used in classification but still expected by the DataProcessor. text: The sentence or sequence of text.

shainaraza avatar Mar 13 '20 17:03 shainaraza

Not without creating your own model class. Transformer models only accept a sequence of text as its input.

ThilinaRajapakse avatar Mar 13 '20 20:03 ThilinaRajapakse

thanks you very much for yours reply. can I make some change here def init(self, input_ids, input_mask, segment_ids, label_id): self.input_ids = input_ids self.input_mask = input_mask self.segment_ids = segment_ids self.label_id = label_id

shainaraza avatar Mar 14 '20 17:03 shainaraza

I'm not sure where that piece of code is from. Essentially, you'll need to edit the BertForSequenceClassification class in the transformers library so that it can accept additional inputs. You'll also need to write the forward() function to handle the inputs.

ThilinaRajapakse avatar Mar 15 '20 16:03 ThilinaRajapakse

Thanks you ThilinaRajapakse for yours great work and timely responses, I am using this library and definitely acknowledge and refer you in my coming work, all the best

shainaraza avatar Mar 15 '20 16:03 shainaraza

No problem!

Take a look at Simple Transformers as well. You may find it easier to work with compared to this repo.

ThilinaRajapakse avatar Mar 15 '20 16:03 ThilinaRajapakse

yes I am using simple transformers too, its super easy to use. I am currently using google colab. sometimes I get error "RuntimeError: CUDA error: device-side assert triggered". Which cloud services for GPU do you suggest, my dataset is like 2GB. thanks in advance

shainaraza avatar Mar 15 '20 16:03 shainaraza

That error normally happens when you have bad data in your dataset (invalid labels, special characters, etc.)

I don't use cloud GPUs so I'm afraid I can't really recommend any.

ThilinaRajapakse avatar Mar 15 '20 16:03 ThilinaRajapakse

thanks ThilinaRajapakse for yours timely response once again, I agree with you about data, one last question for today, can I run same simple transformers on CPU, i mean you tested and built all these models, did you use some GPUs or just CPU

shainaraza avatar Mar 15 '20 16:03 shainaraza

You can run them on either. However, running on CPU will be far too slow for it to be practical. I always train using a GPU.

ThilinaRajapakse avatar Mar 15 '20 17:03 ThilinaRajapakse

thanks, best to you

shainaraza avatar Mar 15 '20 17:03 shainaraza