qEndpoint
qEndpoint copied to clipboard
GH-600 parallel parsing of NQUADS and N-Triples
Issue resolved (if any): #600
Description of this pull request:
Please check all the lines before posting the pull request:
- [ ] I've created tests for all my changes
- [ ] My pull request isn't fixing or changing multiple unlinked elements (please create one pull request for each element)
- [ ] I've applied the code formatter (
mvn formatter:formaton the backend,npm run formaton the frontend) before posting my pull request,mvn formatter:validateto validate the formatting on the backend,npm run validateon the frontend - [ ] All my commits have relevant names
- [ ] I've squashed my commits (if necessary)
Timed conversion of latest-lexemes.nt.gz from https://dumps.wikimedia.org/wikidatawiki/entities/ . Tested on an M3 Max with 16 cores. Originally 11 minutes, now 7 minutes.
Before
After
A few of the tests assumed that the RDF parser would return statements in a fixed and predictable order.
I fixed up a couple of them, but then found out that it's probably best to have a way to enable/disable parallel parsing.
Now all the tests are passing, but I'll need to double check the performance now to see that it's still as good as expected.
Can you start testing it @ate47 ?
I think you can also get a look at the ExceptionThread class, I've made it to bind threads together while keeping track of the exceptions.