bsd icon indicating copy to clipboard operation
bsd copied to clipboard

Handling of quotations

Open jpfairbanks opened this issue 7 years ago • 13 comments

How does the bsd handle statements in quotations?

jpfairbanks avatar Nov 21 '17 14:11 jpfairbanks

Currently, BSD doesn't do anything special with quoted material.
However... given that BSD is engineered and tuned to the statement/sentence level of analysis, it would be fairly trivial to extract (or at least flag) quoted material for further analysis -- or to ignore it, as the use case might be. I can see a need for both options: journalists or researchers who don't want to penalize an article for being biased simply due to the quoted material having a bias (may want to just ignore text in quotes)... or researchers who want to explore the delta between the objectivity/bias of the article text vice the quoted material within it... or some other comparison. Implementation-wise: would it work for most purposes if we just add another routine (something like "isInQuote" for the input sentence) that outputs a boolean to the features dict that gets returned in "extract_bias_features"?

cjhutto avatar Nov 21 '17 16:11 cjhutto

Just for sake of documenting, on our recent call with GV, I proposed a few ways to handle quotes, to start with:

  1. detect quotations and remove them and then do bias scoring
  2. detect quotations and leave them in and then do bias scoring
  3. introduce some weighting about how much bias scores of quotes affect the overall article level bias scores (e.g. maybe quotes are given half as much importance as actual article statements)

Yes, agree, there are different reasons why individuals would or wouldn't want to have quotes included; we discussed these briefly on the call. I'm running bsd on some articles to discuss with C from GV.

Yes, the "isInQuote" boolean would be useful to have 👍

scottagt avatar Nov 21 '17 17:11 scottagt

We also discussed a feature which is median quote length. Which detects an adversarial style such as scare quotes, and quotes taken out of context.

jpfairbanks avatar Nov 28 '17 22:11 jpfairbanks

How do we want to move forward on this?

jpfairbanks avatar Dec 19 '17 16:12 jpfairbanks

I'm adding consideration for quotes in my next update to the features... hope to get it done by Sat night.

cjhutto avatar Dec 20 '17 17:12 cjhutto

Ok great, let me know if you need anything.

jpfairbanks avatar Dec 21 '17 14:12 jpfairbanks

Yes, let us know

scottagt avatar Dec 21 '17 14:12 scottagt

Made some big changes, and continuing to do so as I work through my punch list. Among the changes was addressing the desire to consider use of quotes. So far, I've included new features such as has_quotes, mean_quote_length, mean_nonquote_length.

cjhutto avatar Dec 26 '17 15:12 cjhutto

Is there a PR coming?

jpfairbanks avatar Jan 02 '18 16:01 jpfairbanks

I merged and synced - should be there, right?

cjhutto avatar Jan 03 '18 14:01 cjhutto

@cjhutto it broke the build. Can you describe the changes you made?

jpfairbanks avatar Jan 18 '18 18:01 jpfairbanks

This commit deletes all of the ref_lexicons. https://github.com/cjhutto/bsd/commit/f58fabd16f07dfb6e3696f520fde3d150f76aba4

How do you want to handle lexicons going forward?

jpfairbanks avatar Jan 18 '18 18:01 jpfairbanks

I patched the setup.py to use the new ref_lexicons.

jpfairbanks avatar Feb 01 '18 14:02 jpfairbanks