SkillNER icon indicating copy to clipboard operation
SkillNER copied to clipboard

Issue when creating skill db

Open Kkkassini opened this issue 1 year ago • 0 comments

Following how_new_db.md, created a custom skill_db_relax_20.json. But there are slightly difference between mine and your version for example in my version I got "KS1201Q70VWZPS6KTMFR": { "skill_name": "3GPP2 (Telecommunication)", "skill_type": "Specialized Skill", "skill_len": 2, "high_surfce_forms": { "full": "3gpp2 telecommun" }, "low_surface_forms": [ "3gpp2 telecommun", "telecommun 3gpp2" ], "match_on_tokens": false } while in your version: "KS1201Q70VWZPS6KTMFR": { "skill_name": "3GPP2 (Telecommunication)", "skill_type": "Hard Skill", "skill_len": 1, "high_surfce_forms": {"full": "3gpp2"}, "low_surface_forms": [], "match_on_tokens": false, } If using classic stem approach, the high_surfce_forms will be my version. However in your version it's the correct abbreviation form. Meanwhile I see you used abv = SKILL_DB[key]['abbreviation'] but there's never any abbreviation information from EMSI endpoint. I'm wondering if you retrieve this information somewhere else, or it's just a manual work?

Kkkassini avatar Oct 05 '23 18:10 Kkkassini