SkillNER
SkillNER copied to clipboard
Issue when creating skill db
Following how_new_db.md,
created a custom skill_db_relax_20.json.
But there are slightly difference between mine and your version
for example in my version I got
"KS1201Q70VWZPS6KTMFR": { "skill_name": "3GPP2 (Telecommunication)", "skill_type": "Specialized Skill", "skill_len": 2, "high_surfce_forms": { "full": "3gpp2 telecommun" }, "low_surface_forms": [ "3gpp2 telecommun", "telecommun 3gpp2" ], "match_on_tokens": false }
while in your version:
"KS1201Q70VWZPS6KTMFR": { "skill_name": "3GPP2 (Telecommunication)", "skill_type": "Hard Skill", "skill_len": 1, "high_surfce_forms": {"full": "3gpp2"}, "low_surface_forms": [], "match_on_tokens": false, }
If using classic stem approach, the high_surfce_forms will be my version. However in your version it's the correct abbreviation form. Meanwhile I see you used abv = SKILL_DB[key]['abbreviation']
but there's never any abbreviation information from EMSI endpoint.
I'm wondering if you retrieve this information somewhere else, or it's just a manual work?