xml2 icon indicating copy to clipboard operation
xml2 copied to clipboard

using a schema with import fails inside devtools::test

Open shug0131 opened this issue 2 years ago • 6 comments

I have a function in a package I'm building that calls xml2::validate(). This works ok called from console.

But inside devtools::test , it comes back with errors starting

Element '{http://www.w3.org/2001/XMLSchema}import': Failed to locate a schema at location '/me-filer1/groups3$/CCTU/STATISTICS/NON%20STUDY%20FOLDER/Academic%20Research/Eudract%20Tool/R/eudract/inst/extdata/RRSUploadSchema.xsd'. Skipping the import

I think the line in the schema itself is:

<xs:import 
     namespace="http://clinicaltrials.gov/rrs"
     schemaLocation="RRSUploadSchema.xsd"
  />

which is available here (https://prsinfo.clinicaltrials.gov/RRSUploadSchema.xsd)

So it seems to trying to find the other file locally . Weirdly I do have a copy of the imported schema exactly where it is looking, but that doesn't help at all...

If I do devtools::load_all(), and run the line of code again in teh console rather than devtools::test(),. it works fine.

The full error message is:

Element '{http://www.w3.org/2001/XMLSchema}import': Failed to locate a schema at location '/me-filer1/groups3$/CCTU/STATISTICS/NON%20STUDY%20FOLDER/Academic%20Research/Eudract%20Tool/R/eudract/inst/extdata/RRSUploadSchema.xsd'. Skipping the import.Element '{http://www.w3.org/2001/XMLSchema}element', attribute 'ref': The QName value '{http://clinicaltrials.gov/rrs}result' does not resolve to a(n) element declaration.Element '{http://www.w3.org/2001/XMLSchema}element', attribute 'ref': The QName value '{http://clinicaltrials.gov/rrs}resultDisposition' does not resolve to a(n) element declaration.Element '{http://clinicaltrials.gov/prs}study_collection': No matching global declaration available for the validation root.
In addition: Warning message:
In xml_validate.xml_document(original, schema_output) :
  failed to load external entity "/me-filer1/groups3$/CCTU/STATISTICS/NON%20STUDY%20FOLDER/Academic%20Research/Eudract%20Tool/R/eudract/inst/extdata/RRSUploadSchema.xsd" [1549]

To reproduce see package at (https://github.com/shug0131/eudraCT/tree/master/R/eudract)

shug0131 avatar Jul 30 '21 15:07 shug0131

Can you please provide a minimal reprex (reproducible example)? The goal of a reprex is to make it as easy as possible for me to recreate your problem so that I can fix it: please help me help you! If you've never heard of a reprex before, start by reading about the reprex package, including the advice further down the page. Please make sure your reprex is created with the reprex package as it gives nicely formatted output and avoids a number of common pitfalls.

hadley avatar Feb 28 '22 20:02 hadley

Thanks for looking at this. I'm trying to get towards a reprex, but tricky as I might need to build a file on the fly so that testthat can be run.
Anyway, very helpful for making me look into the problem. I think the line in the schema

<xs:import namespace="http://clinicaltrials.gov/rrs" schemaLocation="RRSUploadSchema.xsd" />

I'd want to understand where the parser implemented by xml2 looks for the file given by schemaLocation. I've a knowledge gap here, but struggle to find documentation to help. Is it internally within R, or a local file, with absolute or relative file paths, or can it be tripped up with networked drives, e.g. "//me-filer1/.." for me. Thanks for any pointers.

shug0131 avatar Mar 08 '22 10:03 shug0131

I don't know either, but I'd imagine it either looks in the current working directory, or in a path relative to the xml2 file you're reading. You might have to read the spec to find out.

hadley avatar Mar 08 '22 13:03 hadley

Could you be more explicit by "the spec" please?

OK, using reprex() came up with

x Install the styler package in order to use `style = TRUE`.
i Rendering reprex...
pandoc.exe: \\: openBinaryFile: invalid argument (Invalid argument)
Error: pandoc document conversion failed with error 1"

But this is as best I can do to illustrate. Problem is down to network drives and .libPath().

install.packages("eudract")
rm(list=ls())
# Forcing it to use a mapped drive on Windows
.libPaths("U:/My Documents/R/win-library/4.1")
original <- system.file("extdata", "1234.xml", package = "eudract")
original <- xml2::read_xml(original)
schema_output <-  system.file("extdata", "ProtocolRecordSchema.xsd",package = "eudract")
schema_output <- xml2::read_xml(schema_output)
xml2::xml_validate(original, schema_output)
# Works fine
.libPaths("\\\\me-filer1/home$/sjb277/My Documents/R/win-library/4.1")
# This is in fact the same directory as previously set but using non-mapped network drive paths
xml2::xml_validate(original, schema_output)
# Still works
# Read in the schema afresh
schema_output <-  system.file("extdata", "ProtocolRecordSchema.xsd",package = "eudract")
schema_output <- xml2::read_xml(schema_output)
xml2::xml_validate(original, schema_output)
# now it fails.

shug0131 avatar Mar 08 '22 15:03 shug0131

The spec = the xml schema spec I guess? The first part of the problem is always figuring out which spec applies 😄

hadley avatar Mar 08 '22 17:03 hadley

spec link

Can't say I'm any the wiser from a quick read-though. But still possibly useful to have tracked it down.

shug0131 avatar Mar 09 '22 09:03 shug0131