dgl-ke
dgl-ke copied to clipboard
Invalid separator for raw triplet data
Using version 0.1.2 Following the documentation here,
The recommended delimiter includes \t, |, , and ;
When I give the following mapping to DGLKE using --raw_data flag:
0,"Beijing"
1,"China"
2,"Paris"
3,"France"
…
I get an error invalid literal for int() with base 10: '0,"Beijing"'
At line 71 of file utils.py, the function load_taw_triplet_data(…):
if emap_f is not None:
eid_map = {}
id2e_map = {}
with open(emap_f, 'r') as f:
reader = csv.reader(f, delimiter='\t')
for row in reader:
eid_map[row[1]] = int(row[0])
id2e_map[int(row[0])] = row[1]
if rmap_f is not None:
rid_map = {}
id2r_map = {}
with open(rmap_f, 'r') as f:
reader = csv.reader(f, delimiter='\t')
for row in reader:
rid_map[row[1]] = int(row[0])
id2r_map[int(row[0])] = row[1]
Why the documentation say that we can use different separators, and only the \t character is allowed here ?
emap_f and rmap_f is the entity id mapping file and relation id mapping file. Are you using raw_udd_{htr} ?
Yes I used raw_udd_{hrt}.
I just misread the documentation, the recommended delimiters are for the knowledge graph files.
I just don't understand why the given example mapping for entities and relations are separated with , and the only allowed separator is \t
Yes I used
raw_udd_{hrt}.I just misread the documentation, the recommended delimiters are for the knowledge graph files.
I just don't understand why the given example mapping for entities and relations are separated with
,and the only allowed separator is\t
This should a bug in the raw_udd_ parser. Need to fix it.
@classicsong I'm seeing this issue and I used the custom separator: |
example mapping file
$ head entities.tsv
0|XXXX
1|Ton
2|Convertible
..
Is this a matter of adding the separator argument to the signature and passing it along?
Places where this proposal would change the code:
add in entrypoint
-
https://github.com/awslabs/dgl-ke/blob/master/python/dglke/infer_emb_sim.py#L27 (add to arg parser) and pass it as an arg to necessary
load_*methods in theinfer_emb_sim.pymodulemainfunction -
https://github.com/awslabs/dgl-ke/blob/master/python/dglke/infer_score.py#L27
-
and pass along (where necessary to
load_*methods inmain.
add in signatures
- https://github.com/awslabs/dgl-ke/blob/master/python/dglke/utils.py#L60
- https://github.com/awslabs/dgl-ke/blob/master/python/dglke/utils.py#L150
- https://github.com/awslabs/dgl-ke/blob/master/python/dglke/utils.py#L161
pass separator values
- https://github.com/awslabs/dgl-ke/blob/master/python/dglke/utils.py#L65
- https://github.com/awslabs/dgl-ke/blob/master/python/dglke/utils.py#L74
- https://github.com/awslabs/dgl-ke/blob/master/python/dglke/utils.py#L154
- https://github.com/awslabs/dgl-ke/blob/master/python/dglke/utils.py#L166
@classicsong if you can give an indication of whether this general approach looks correct, I can look into making a change for the separator for these entrypoints