kge icon indicating copy to clipboard operation
kge copied to clipboard

Using buffer for writing to a file during preprocessing

Open bhanu77prakash opened this issue 2 years ago • 0 comments

In the data preprocessing code, there is a function that write a triple to a file

def write_triple(f, ent, rel, t, S, P, O):
    """Write a triple to a file. """
    f.write(str(ent[t[S]]) + "\t" + str(rel[t[P]]) + "\t" + str(ent[t[O]]) + "\n")

I think writing this way would take a lot of time when you deal with 100s of millions of relations. An ideal method would be to maintain a buffer (e.g. a string) and then dump whenever it reaches certain threshold.

bhanu77prakash avatar Apr 13 '22 21:04 bhanu77prakash