nlpaug icon indicating copy to clipboard operation
nlpaug copied to clipboard

Use NLP with dataframes and labels

Open haris525 opened this issue 2 years ago • 1 comments

Hello

I have a similar issue that someone else asked about. I have a dataframe with text column, and classes column. I would like to augment the text column based on classes as some classes are underrepresented and I would like to balance them a bit more, I have about 5 classes. How would I go about doing this?

I tried following the approach here

https://github.com/makcedward/nlpaug/issues/209

here is my code


aug_data = []
for group, d in mydataframe.groupby(['class']):
  a_data = aug_wordnet.augment(d)
  a_data = pd.DataFrame(aug_data, columns=['text'])
  a_data['class'] = class
  aug_data.append(a_data)

aug_data = pd.concat(aug_data)

but it gives me the error message AttributeError: 'DataFrame' object has no attribute 'strip'. My class column is int64, and text column is object64

Thanks

haris525 avatar Apr 12 '22 17:04 haris525

Consider to use the following sample code

aug_data = []
for group, d in mydataframe.groupby(['class']):
  a_data = aug_wordnet.augment(d["your column"].tolist())
  a_data = pd.DataFrame(aug_data, columns=['text'])
  a_data['class'] = class
  aug_data.append(a_data)

aug_data = pd.concat(aug_data)

makcedward avatar Jul 07 '22 05:07 makcedward