kge
kge copied to clipboard
Using buffer for writing to a file during preprocessing
In the data preprocessing code, there is a function that write a triple to a file
def write_triple(f, ent, rel, t, S, P, O):
"""Write a triple to a file. """
f.write(str(ent[t[S]]) + "\t" + str(rel[t[P]]) + "\t" + str(ent[t[O]]) + "\n")
I think writing this way would take a lot of time when you deal with 100s of millions of relations. An ideal method would be to maintain a buffer (e.g. a string) and then dump whenever it reaches certain threshold.