exist
exist copied to clipboard
[BUG] file:sync doesn't respect conf.xml serializer; hard codes serialization parameters
Describe the bug
The file:sync
function, which writes collection contents out to disk, hard codes serialization parameters such as indent=yes
. This inserts whitespace into data and can pollute application-sensitive whitespace practices and lead to undesired source control changes.
Expected behavior
All functions that serialize data should, at bare minimum, respect the <serializer>
settings in conf.xml.
See https://github.com/eXist-db/exist/blob/develop/extensions/modules/file/src/main/java/org/exist/xquery/modules/file/Sync.java#L91-L96.
To Reproduce
The final expression below returns false()
even when conf.xml's serializer/@indent
is set to "no"
. If it respected this setting, the expression would return true()
.
xquery version "3.1";
xmldb:create-collection("/db", "test"),
xmldb:store("/db/test", "test.xml", <foo><bar/></foo>),
file:mkdir("/Users/joe/test"),
file:sync("/db/test", "/Users/joe/test", ()),
deep-equal(<foo><bar/></foo>, doc("file:/Users/joe/test/test.xml"))
Adapting this to xqsuite:
xquery version "3.1";
module namespace t="http://exist-db.org/xquery/test";
declare namespace test="http://exist-db.org/xquery/xqsuite";
declare variable $t:XML := document { <foo><bar/></foo> };
declare variable $t:test-col-DB := "/db/test";
declare variable $t:test-col-FS := "file:" || util:system-property("java.io.tmpdir") || "org.exist.test";
declare
%test:setUp
function t:setup() {
xmldb:create-collection("/db", "test"),
xmldb:store($t:test-col-DB, "test.xml", $t:XML),
file:mkdir($t:test-col-FS),
file:sync($t:test-col-DB, $t:test-col-FS, ())
};
declare
%test:tearDown
function t:tearDown() {
xmldb:remove($t:test-col-DB),
file:delete($t:test-col-FS)
};
declare
%test:assertTrue
function t:test() {
deep-equal(
doc($t:test-col-DB || "/test.xml"),
doc($t:test-col-FS || "/test.xml")
)
};
Context (please always complete the following information):
- OS: macOS 11.1
- eXist-db version: eXist 5.3.0-SNAPSHOT 35a76f25f0431220de95ff43752232bbf6a59efc 20210104165718
- Java Version: 1.8.0_275
Additional context
- How is eXist-db installed? built from source
- Any custom changes in e.g.
conf.xml
? Toggledserializer/@indent
between default value,yes
, and custom value,no
I've updated the report to add an xqsuite version of the test.
Here's how eXist's implementation of fn:serialize
gets serialization parameters from conf.xml: https://github.com/eXist-db/exist/blob/adebc77a884fea5e0120c25290b9767ef01dce08/exist-core/src/main/java/org/exist/xquery/functions/fn/FunSerialize.java#L94-L106.
At least I think so. Here's SerializerUtils.getSerializationOptions
: https://github.com/eXist-db/exist/blob/303ca0db511ee93bac1b50ccede357fde9658e7b/extensions/modules/file/src/main/java/org/exist/xquery/modules/file/SerializeToFile.java#L174-L196. Are the defaults for getSerializationOptions
taken from conf.xml, or they defined in ParameterConvention
? https://github.com/eXist-db/exist/blob/303ca0db511ee93bac1b50ccede357fde9658e7b/extensions/modules/file/src/main/java/org/exist/xquery/modules/file/SerializeToFile.java#L174-L196.
(I've also spotted hardcoded serialization parameters in file:serialize
, though this function also allows you to override the hardcoded values: https://github.com/eXist-db/exist/blob/303ca0db511ee93bac1b50ccede357fde9658e7b/extensions/modules/file/src/main/java/org/exist/xquery/modules/file/SerializeToFile.java#L174-L196. It's puzzling why file:serialize
needs to hardcode parameters at all. Shouldn't it just read from conf.xml?)
refs #4081
@line-o I believe this was fixed by your #4254, wasn't it?
@joewiz I just tested it and it appears that settings in conf.xml do not affect serialization of file:sync.
Digging a little deeper in the implementation of fn:serialize the settings in conf.xml are present in the Serializerpool instances. Borrowing one of them would do the trick.
https://github.com/eXist-db/exist/blob/adebc77a884fea5e0120c25290b9767ef01dce08/exist-core/src/main/java/org/exist/util/serializer/XQuerySerializer.java#L85-L103
This change is also considered as a breaking by @adamretter
@line-o Ah, I see! Yes, with your PR, we can set serialization parameters, but file:sync just doesn't yet respect defaults in conf.xml. I can see why a new PR causing file:sync to respect conf.xml defaults could break existing code that didn't specify serialization params and would probably need to be in a major release. Thanks!
Okay, so if I understand correctly. This should respect the conf.xml
serializer settings. I also think @line-o mentioned that he can do that relatively easily. I also respect that doing so might be a breaking change. So if we can do that, let's schedule this for eXist-db 7.0.0. Does that suit everyone?
@adamretter Yes, I just checked myself and performed the test in the report above with a build of eXist develop (eXist 6.1.0-SNAPSHOT 5e23335288da3f146981e335197efce51de5664e 20220719105055) whose conf.xml's serializer/@indent is set to "no". The file was synced to disk with indentation, i.e., as follows:
<?xml version="1.0" encoding="UTF-8"?>
<foo>
<bar/>
</foo>
This indicates that file:sync does not currently respect conf.xml.
I agree that a fix would count as a breaking change, so the issue is correctly labeled as 7.0.0.