lingfeat icon indicating copy to clipboard operation
lingfeat copied to clipboard

How to apply function .preprocess and others to Pandas df?

Open fatihbozdag opened this issue 1 year ago • 3 comments

Greetings all,

I have a large corpus zipping into a Pandas dataframe and I'd like to iterate text column to record the results of individual functions to separate columns. As far as I get, extractor only accepts str. I am trying to merge scores with metadata included in the dataframe.

For instance, my dataframe is follows.

df.head()
  docid_field  ...                                         text_field
0    BGSU1001  ...   <ICLE-BG-SUN-0001.1> \nIt is time, that our s...
1    BGSU1002  ...   <ICLE-BG-SUN-0002.1> \nNowadays there is a gr...
2    BGSU1003  ...   <ICLE-BG-SUN-0003.1> \nOnce upon a time there...
3    BGSU1004  ...   <ICLE-BG-SUN-0004.1> \nOur educational system...
4    BGSU1005  ...   <ICLE-BG-SUN-0005.1> \nScience, technology an...

Is there a way to apply LingFeat function to df['text_field'] and record scores (let's say LingFeat.EnDF_()) as tuples into another column? I did try

df['LingFeat'] = df['text_field'].apply(lambda x: extractor.pass_text(x))

and the result is

0      <lingfeat.extractor.pass_text object at 0x0000...
1      <lingfeat.extractor.pass_text object at 0x0000...
2      <lingfeat.extractor.pass_text object at 0x0000...
3      <lingfeat.extractor.pass_text object at 0x0000...
4      <lingfeat.extractor.pass_text object at 0x0000...
                       
923    <lingfeat.extractor.pass_text object at 0x0000...
924    <lingfeat.extractor.pass_text object at 0x0000...
925    <lingfeat.extractor.pass_text object at 0x0000...
926    <lingfeat.extractor.pass_text object at 0x0000...
927    <lingfeat.extractor.pass_text object at 0x0000...
Name: LingFeat, Length: 928, dtype: object

I couldn't go on any further. How should I do it, if it is possible?

fatihbozdag avatar Jan 14 '23 20:01 fatihbozdag