SkillsExtractorCognitiveSearch
SkillsExtractorCognitiveSearch copied to clipboard
Remove absolute duplicates from skill_patterns.jsonl
Was going through the data when I saw that there were a few instances of duplicated patterns.
I wrote a quick python script to remove absolute duplicates (objects are completely equal)
import json
read_objects = []
parsed_objects = []
with open("skill_patterns.jsonl") as h:
for line in h.readlines():
if line not in read_objects:
read_objects.append(line)
parsed_objects.append(json.loads(line))
with open("skill_patterns.jsonl", "w") as h:
for item in parsed_objects:
h.write(json.dumps(item, separators=(",", ":")) + "\n")