revscoring icon indicating copy to clipboard operation
revscoring copied to clipboard

Example on README did not work

Open bkowshik opened this issue 7 years ago • 7 comments

Example on README

import mwapi
from revscoring import ScorerModel
from revscoring.extractors.api.extractor import Extractor

with open("models/enwiki.damaging.linear_svc.model") as f:
    scorer_model = ScorerModel.load(f)

extractor = Extractor(mwapi.Session(host="https://en.wikipedia.org", user_agent="revscoring demo"))
feature_values = list(extractor.extract(123456789, scorer_model.features))

print(scorer_model.score(feature_values))
# {'prediction': True, 'probability': {False: 0.4694409344514984, True: 0.5305590655485017}}

Error message

FileNotFoundError                         Traceback (most recent call last)
<ipython-input-1-a72266a26745> in <module>()
      3 from revscoring.extractors.api.extractor import Extractor
      4
----> 5 with open("models/enwiki.damaging.linear_svc.model") as f:
      6     scorer_model = ScorerModel.load(f)
      7

FileNotFoundError: [Errno 2] No such file or directory: 'models/enwiki.damaging.linear_svc.model'

I am 🤔 what should be done to get the example working.


cc: @halfak @geohacker

bkowshik avatar May 03 '17 12:05 bkowshik

I found a similar example in the examples folder which did not work either.

$ python examples/scoring.py
Traceback (most recent call last):
  File "examples/scoring.py", line 5, in <module>
    with open("models/enwiki.damaging.linear_svc.model") as f:
FileNotFoundError: [Errno 2] No such file or directory: 'models/enwiki.damaging.linear_svc.model'

cc: @batpad

bkowshik avatar May 04 '17 10:05 bkowshik

Good questions! So this example doesn't work, that's right. We'd need to rebuild a model and keep it in sync with the repository in order for this to continue to work as intended. This is hard because pickle is our main serializer and it's pretty stupid. I don't think it would make sense to require every merged PR to update the model.

I see two good options here:

  1. Drop the file loading example. Loading serialized files is inherently problematic.
  2. Have the example create the serialized file and load it.

For (1), this might make sense because this repository doesn't store up-to-date models. For (2), we'll have the problem that you can't really train a model in < 10 lines of code in any useful way.

See https://github.com/wiki-ai/editquality for an example of repository that does store models that are sync'd to a version of this library.

halfak avatar May 04 '17 15:05 halfak

See https://github.com/wiki-ai/editquality for an example of repository that does store models that are sync'd to a version of this library.

I know that a repository has trained models when the git clone takes more time than it should. 😬

bkowshik avatar May 05 '17 10:05 bkowshik

We'd like to use git-lfs for this, but our internal infra doesn't support it :(

halfak avatar May 05 '17 14:05 halfak

For (2), we'll have the problem that you can't really train a model in < 10 lines of code in any useful way.

Is there any > 10 lines example available?

he7d3r avatar Apr 08 '20 18:04 he7d3r

This is essentially the same as https://phabricator.wikimedia.org/T250635. See also https://github.com/wikimedia/revscoring/pull/486.

he7d3r avatar Apr 30 '20 17:04 he7d3r

#486 is merged, therefore no need to mention that. Although I would like @bkowshik to try out the new example before continuing. If the new example is OK, this issue should be CLOSED. Edit: The model file needs to be created as it is not included in the repository. Please see this

ghost avatar Oct 06 '20 12:10 ghost