quanteda.dictionaries
quanteda.dictionaries copied to clipboard
Does liwkalike() handle proper regular expressions?
Dear Dr. Benoit,
I tried to run the following:
txt <- c("The red-shirted lawyer gave her yellow-haired, red nose ex-boyfriend $300
out of pity:(.")
dict <- quanteda::dictionary(list(lawyer = c("\\blawyer\\b", "law.er")))
liwcalike(txt, dict, what = "word", valuetype = "regex")
But the word lawyer is not matched:
docname Segment WPS WC Sixltr Dic lawyer AllPunc Period Comma Colon SemiC QMark Exclam Dash Quote
1 text1 1 24 24 8.33 0 0 29.17 4.17 4.17 4.17 0 0 0 12.5 0
Apostro Parenth OtherP
1 0 0 12.5`
Is this expected behavior? To what extent are regular expressions supported by liwkalike() and, downstream, tokens_lookup.tokens()?
Thank you sincerely, Caspar
Currently, liwcalike() only takes "glob" dictionary patterns, but it would be a reasonable feature request to add valuetype to the function.
To get the equivalent patterns, you would use:
library("quanteda.dictionaries")
txt <- c("The red-shirted lawyer gave her yellow-haired,
red nose ex-boyfriend $300 out of pity:(.")
dict <- quanteda::dictionary(list(lawyer = c("lawyer", "law?er")))
liwcalike(txt, dict)
## docname Segment WPS WC Sixltr Dic lawyer AllPunc Period Comma Colon SemiC
## 1 text1 1 24 24 8.33 4.17 4.17 29.17 4.17 4.17 4.17 0
## QMark Exclam Dash Quote Apostro Parenth OtherP
## 1 0 0 12.5 0 0 0 12.5
Thank you for clarifying! I have a dictionary that makes extensive use of perl regex, so indeed, I would like to put my name down for this feature request :)
Sincerely, Caspar
Noted! This will not be hard to add.