bsd
bsd copied to clipboard
Handling of quotations
How does the bsd handle statements in quotations?
Currently, BSD doesn't do anything special with quoted material.
However... given that BSD is engineered and tuned to the statement/sentence level of analysis, it would be fairly trivial to extract (or at least flag) quoted material for further analysis -- or to ignore it, as the use case might be. I can see a need for both options: journalists or researchers who don't want to penalize an article for being biased simply due to the quoted material having a bias (may want to just ignore text in quotes)... or researchers who want to explore the delta between the objectivity/bias of the article text vice the quoted material within it... or some other comparison.
Implementation-wise: would it work for most purposes if we just add another routine (something like "isInQuote" for the input sentence) that outputs a boolean to the features dict that gets returned in "extract_bias_features"?
Just for sake of documenting, on our recent call with GV, I proposed a few ways to handle quotes, to start with:
- detect quotations and remove them and then do bias scoring
- detect quotations and leave them in and then do bias scoring
- introduce some weighting about how much bias scores of quotes affect the overall article level bias scores (e.g. maybe quotes are given half as much importance as actual article statements)
Yes, agree, there are different reasons why individuals would or wouldn't want to have quotes included; we discussed these briefly on the call. I'm running bsd on some articles to discuss with C from GV.
Yes, the "isInQuote" boolean would be useful to have 👍
We also discussed a feature which is median quote length. Which detects an adversarial style such as scare quotes, and quotes taken out of context.
How do we want to move forward on this?
I'm adding consideration for quotes in my next update to the features... hope to get it done by Sat night.
Ok great, let me know if you need anything.
Yes, let us know
Made some big changes, and continuing to do so as I work through my punch list. Among the changes was addressing the desire to consider use of quotes. So far, I've included new features such as has_quotes, mean_quote_length, mean_nonquote_length.
Is there a PR coming?
I merged and synced - should be there, right?
@cjhutto it broke the build. Can you describe the changes you made?
This commit deletes all of the ref_lexicons. https://github.com/cjhutto/bsd/commit/f58fabd16f07dfb6e3696f520fde3d150f76aba4
How do you want to handle lexicons going forward?
I patched the setup.py to use the new ref_lexicons.