rdflib
rdflib copied to clipboard
Establish clear provenance for all non original test data.
All original test data should have clear provenance so we know that we are testing the right things, this is in part to mitigate problems like like this. The best way to establish provenance is to programatically download test data, and then make it possible to re-dowload the test data as part of our test run and then ensuring it has not changed.
It would be good to solve this before adding more test data.
I started working on a Makefile for this, but I think doing this from python may be more sensible as people working on this library likely know python better than GNU make and Python is much more portable and less quirky than GNU Make.
# This file exists mainly to declaratively establish the provenance of test data.
# Runing this file with `make -B all` should redownload all test data with established provanance and should result in no changes to the files on dis.
all:
all: rdfs.ttl
rdfs.ttl:
curl -L --header "Accept: text/turtle" http://www.w3.org/2000/01/rdf-schema# > $(@)
all: defined_namespaces/qb.ttl
defined_namespaces/qb.ttl:
curl -L --header "Accept: text/turtle" http://purl.org/linked-data/cube > $(@)
all: suites/w3c/turtle/README
suites/w3c/turtle/README:
rm -vr $(dir $(@)) || true
mkdir -vp $(dir $(@))
curl https://www.w3.org/2013/TurtleTests/TESTS.tar.gz | tar -zxvf - --strip-components=1 -C $(dir $(@))
all: suites/w3c/nquads/README
suites/w3c/nquads/README:
rm -vr $(dir $(@)) || true
mkdir -vp $(dir $(@))
curl https://www.w3.org/2013/N-QuadsTests/TESTS.tar.gz | tar -zxvf - --strip-components=1 -C $(dir $(@))
all: suites/w3c/ntriples/README
suites/w3c/ntriples/README:
rm -vr $(dir $(@)) || true
mkdir -vp $(dir $(@))
curl https://www.w3.org/2013/N-TriplesTests/TESTS.tar.gz | tar -zxvf - --strip-components=1 -C $(dir $(@))
all: suites/w3c/trig/README
suites/w3c/trig/README:
rm -vr $(dir $(@)) || true
mkdir -vp $(dir $(@))
curl https://www.w3.org/2013/TrigTests/TESTS.tar.gz | tar -zxvf - --strip-components=1 -C $(dir $(@))
# TODO FIXME: This directoy contains additional files that should be removed:
# - Manifest.rdf
# - datatypes/test001.borked
all: suites/w3c/rdfxml/README
suites/w3c/rdfxml/README:
rm -vr $(dir $(@)) || true
mkdir -vp $(dir $(@))
curl https://www.w3.org/2013/RDFXMLTests/TESTS.tar.gz | tar -zxvf - --strip-components=1 -C $(dir $(@))
# TODO FIXME: This directory contains differences from upstream, it seems to be from an older source.
all: suites/DAWG/data-sparql11/manifest-all.ttl
suites/DAWG/data-sparql11/manifest-all.ttl:
rm -vr $(dir $(@)) || true
mkdir -vp $(dir $(@))
curl https://www.w3.org/2009/sparql/docs/tests/sparql11-test-suite-20121023.tar.gz \
| tar -zxvf - --strip-components=1 -C $(dir $(@))
find $(dir $(@)) -type f -print0 | xargs -0 chmod -v 644
find $(dir $(@)) -type f -print0 | xargs -0 dos2unix
find $(dir $(@)) -type d -print0 | xargs -0 chmod -v 755
I'm working on this as part of https://github.com/RDFLib/rdflib/issues/1807 and https://github.com/RDFLib/rdflib/issues/1701 - as I want to download n3 test data from https://github.com/w3c/N3/tree/master/tests. I will write it in python, it may be slightly more verbose than writing a Makefile but Makefiles have their own host of problems.