NLP2SPARQL icon indicating copy to clipboard operation
NLP2SPARQL copied to clipboard

data for pretraining SPBERT

Open aosokin opened this issue 2 years ago • 2 comments

Hi, thanks for releasing the code for your method and the weights for your models!

While reading your paper I got very interested in the data you used to pre-train SPBERT. The paper says the following: "To prepare a large-scale pre-training corpus, we leverage SPARQL queries from end-users, massive and highly diverse structures. These query logs can be obtained from the DBpedia endpoint powered by a Virtuoso instance. We only focus on valid DBpedia query logs spans from October 2015 to April 2016."

Could you please explain in more detail how to get this data? Would it be possible for you to release the exact corpus used for pre-training?

Best, Anton

aosokin avatar Sep 27 '21 09:09 aosokin