typedb-loader
typedb-loader copied to clipboard
Problematic .tsv processing
When trying to ingest from .tsv files using Loader 1.4.1 on Ubuntu 20.04, I receive the following error:
[open_alex_0::5] ERROR com.vaticle.typedb.osi.loader.loader - async-writer-4: [THW07] Invalid Thing Write: Attempted to assign a key ',' of type 'id' that had been taken by another 'researcher'.
However, I've reviewed the .tsv and confirmed there are no comma values in this column; all values are open_alex identifiers, which are URLs starting with https.
In my typeDB config.json file, I have it set to expect tab separators, and it successfully ingests hundreds of thousands of rows.
"separator": "\t",
Below is a screenshot of confirming there are no commas in the id
column using Python and Pandas.
I considered it being an issue with perhaps the header since it fails on the 2nd .tsv it's going through, as there is one record in the database with a comma for an id.
However, it doesn't fail until processing over 600,000 rows according to TypeDB processing updates.
So it does sound like the data is corrupt somehow, have you managed to track down the duplicate ,
?
There are no comma values for the id
column in the source data.
Any chance that you could share the data file? Would be fine to obfuscate it as long as it reproduces the error...
On Wed, Aug 31, 2022, 16:54 suciokhan @.***> wrote:
There are no comma values for the id column in the source data.
— Reply to this email directly, view it on GitHub https://github.com/typedb-osi/typedb-loader/issues/62#issuecomment-1233046880, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAWAVSVKKPIWUCHNEOSU77TV35W2NANCNFSM5753BHUA . You are receiving this because you are subscribed to this thread.Message ID: @.***>
Sure, I will send you a link to the 2 files I was having trouble with.