inspire-next icon indicating copy to clipboard operation
inspire-next copied to clipboard

HoldingPen: good records automatically rejected

Open ksachs opened this issue 7 years ago • 13 comments

2 examples where no keyword (CORE or non-CORE) was extracted on labs, there is no decision from the guesser and good records are rejected automatically. Esp. the first record is a clear case. Something went really wrong in the workflow.

https://labs.inspirehep.net/holdingpen/675064 Neutrino Mass Sum-rule and Neutrinoless Double Beta Decay should have 6 CORE KWs CP, violation; (0neutrino); neutrino; electroweak interaction; neutrino, mass; Gran Sasso; double-beta decay

https://labs.inspirehep.net/holdingpen/675482 The isolated, uniformly moving electron should have 3 CORE KWs Yang-Mills-Higgs theory; Yang-Mills; magnetic monopole; caloron

@fschwenn

ksachs avatar Jul 11 '17 11:07 ksachs

another example: https://labs.inspirehep.net/holdingpen/694229 Lensing Bias to CMB Polarization Measurements of Compensated Isocurvature Perturbations should have 5 CORE KWs dark matter; curvaton; neutrino; baryon number; inflaton

ksachs avatar Aug 07 '17 09:08 ksachs

should have 3 CORE KWs

Sorry @ksachs, but how are you determining that? In particular, are we using the same ontologies/knowledge bases? Because I see that https://github.com/inspirehep/inspire-next/pull/2282 has not yet been merged...

jacquerie avatar Aug 07 '17 11:08 jacquerie

when searching for another bug (BibClassify used only title/abstract instead of the fulltext) we were comparing keywords. Yes we are using the same ontology and BibClassify parameters. And if there are keywords on labs they are identical to what we have at DESY. At least as far as I noticed - I don't compare every single KW and I can't systematically search the holdingpen since there is no API I am capable of using. #2282 is an update of the taxonomy with some additional KW that are not relevant for this issue.

ksachs avatar Aug 07 '17 11:08 ksachs

2 new examples: https://labs.inspirehep.net/holdingpen/756285 https://labs.inspirehep.net/holdingpen/756286

ksachs avatar Oct 18 '17 11:10 ksachs

https://labs.inspirehep.net/holdingpen/760574

ksachs avatar Oct 25 '17 10:10 ksachs

https://labs.inspirehep.net/api/holdingpen/762556 "doc": "Mark the workflow object with already-ingested:True.",

Why? arXiv:1710.09270 is not in INSPIRE and 762556 is the only record in the holdingpen.

same for arXiv:1710.09271 https://labs.inspirehep.net/holdingpen/762556

ksachs avatar Oct 26 '17 08:10 ksachs

Just about that last message (762556) (checking one by one):

They were rejected because they are too old (more than 5 days):

(Submitted on 20 Oct 2017)

and thus considered updates, and discarded as we yet don't support updates on labs.

The actual function that checks that is:

    "doc": "IF: args(<function previously_rejected at 0x7dfbd70>, [<function mark at 0x7dfbe60>, <function mark at 0x7dfbf50>]); kwargs().",

I know the names are awful, just bear with me for now, we are working on that.

david-caro avatar Oct 26 '17 08:10 david-caro

Thanks David. I thought this time cut-off was disabled. We discussed it last week with Sam. Is it a lot of work to take that out or increase the time window?

ksachs avatar Oct 26 '17 09:10 ksachs

It's not yet removed, we are working on it, but increasing the window should be easy, @kaplun can you take care of it?

david-caro avatar Oct 26 '17 13:10 david-caro

Sure, it's ~~not~~ configurable though. ~~But~~ I can do a quick deployment. I'll set it to 30 days...

kaplun avatar Oct 26 '17 13:10 kaplun

OK! Set to 30 days. Let's see.

kaplun avatar Oct 26 '17 14:10 kaplun

@fschwenn and I searched for missing arXiv articles of 2017. From those we accepted for INSPIRE there are

wrong 'already-ingested' of 'too-many-days' (11): only 1 from November arXiv:1702.05629 "Mark the workflow object with already-ingested:True." arXiv:1703.05574 "Mark the workflow object with already-ingested:True." arXiv:1708.07444 "Mark the workflow object with already-ingested:True." arXiv:1709.04483 "Mark the workflow object with too-many-days:True. arXiv:1709.06876 "Mark the workflow object with already-ingested:True." arXiv:1709.10399 "Mark the workflow object with already-ingested:True." arXiv:1710.04496 "Mark the workflow object with already-ingested:True." arXiv:1710.04703 "Mark the workflow object with already-ingested:True." arXiv:1710.00618 "Mark the workflow object with already-ingested:True." arXiv:1710.07616 "Mark the workflow object with already-ingested:True." arXiv:1711.06093 "Mark the workflow object with too-many-days:True."

no trace in the HP at all (13) (#2528) arXiv:1701.01022 , arXiv:1701.07062 , arXiv:1702.08285 , arXiv:1703.05573 , arXiv:1708.04550 , arXiv:1708.06728 , arXiv:1708.07361 , arXiv:1708.08897 , arXiv:1711.06094 , arXiv:1711.06044 , arXiv:1711.06547 , arXiv:1711.06674 , arXiv:1711.09009

and 2 with problems getting keywords which should result in an error (#2528) arXiv:1702.04175 , arXiv:1710.10630

ksachs avatar Jan 08 '18 13:01 ksachs

is this fixed @michamos ?

StellaCh avatar Feb 08 '18 09:02 StellaCh