rdflib.js icon indicating copy to clipboard operation
rdflib.js copied to clipboard

The parsing of larger TTL files seems to take a big performance hit from v1.2.x on

Open HugaertsDries opened this issue 4 years ago • 6 comments

When trying to upgrading from v1.1.x to v1.2.x, I noticed a big performance hit when parsing files larger then 100 kB (122.9kb to be exact). Did something change in how files should be parsed?

The code used is a variant of the following:

import { graph as rdflibGraph, parse as rdflibParse } from 'rdflib';

const SOURCE_GRAPH = 'http://data.lblod.info/graphs/submission';

export function parse(sourceTtl)
    let store = rdflibGraph();
    rdflibParse(sourceTtl, this.store, SOURCE_GRAPH, 'text/turtle');
    return store;
}

Thx in advance!

HugaertsDries avatar May 13 '20 14:05 HugaertsDries

Hmm, I suspect this might be tied to my changes in https://github.com/linkeddata/rdflib.js/commit/6d6284f2a18a98b8fad38a3ad812650f074507d2 =\ I don't know if you're able to test?

@timbl Maybe you have capacity?

megoth avatar May 13 '20 15:05 megoth

The additional call to canon() you suggested. Sounded that could be it.

timbl avatar May 15 '20 07:05 timbl

@megoth any suggestions on how I could test it?

HugaertsDries avatar May 15 '20 15:05 HugaertsDries

You could comment out the calls to canon and the update to this.index in src/store.ts in this function:

  add (
    subj: Quad_Subject | Quad | Quad[] | Statement | Statement[],
    pred?: Quad_Predicate,
    obj?: Term | string,
    why?: Quad_Graph
  ): Quad | null | IndexedFormula ...

If you want to play around directly with the JS (avoid the babel step), you could look in lib/store.js for

    key: "add",
    value: function add(subj, pred, obj, why) ...

If you're in a browser, you may want to disable minification by adding this to webpack.config.js:

optimization: {minimize: false},

ericprud avatar May 15 '20 17:05 ericprud

Hi everyone, any update on this? I'm also experiencing some significant performance hits (up to 10x slower) when parsing large XML files (tens of MB).

For instance, the NAL thesaurus (https://agclass.nal.usda.gov/downloads/NAL_Thesaurus_2020_SKOS.zip?agree3=on&image.x=45&image.y=15) takes more than 2 minutes to parse on my laptop, while it used to take 20/30 seconds on previous versions (I was on 1.0.6 before upgrading to 1.2.2).

TommasoBianchi avatar Jun 05 '20 15:06 TommasoBianchi

Up.

TommasoBianchi avatar Jul 28 '20 13:07 TommasoBianchi