exist icon indicating copy to clipboard operation
exist copied to clipboard

[BUG] file:sync doesn't respect conf.xml serializer; hard codes serialization parameters

Open joewiz opened this issue 4 years ago • 9 comments

Describe the bug

The file:sync function, which writes collection contents out to disk, hard codes serialization parameters such as indent=yes. This inserts whitespace into data and can pollute application-sensitive whitespace practices and lead to undesired source control changes.

Expected behavior

All functions that serialize data should, at bare minimum, respect the <serializer> settings in conf.xml.

See https://github.com/eXist-db/exist/blob/develop/extensions/modules/file/src/main/java/org/exist/xquery/modules/file/Sync.java#L91-L96.

To Reproduce

The final expression below returns false() even when conf.xml's serializer/@indent is set to "no". If it respected this setting, the expression would return true().

xquery version "3.1";

xmldb:create-collection("/db", "test"),
xmldb:store("/db/test", "test.xml", <foo><bar/></foo>),
file:mkdir("/Users/joe/test"),
file:sync("/db/test", "/Users/joe/test", ()),
deep-equal(<foo><bar/></foo>, doc("file:/Users/joe/test/test.xml"))

Adapting this to xqsuite:

xquery version "3.1";

module namespace t="http://exist-db.org/xquery/test";

declare namespace test="http://exist-db.org/xquery/xqsuite";

declare variable $t:XML := document { <foo><bar/></foo> };
declare variable $t:test-col-DB := "/db/test";
declare variable $t:test-col-FS := "file:" || util:system-property("java.io.tmpdir") || "org.exist.test";

declare
    %test:setUp
function t:setup() {
    xmldb:create-collection("/db", "test"),
    xmldb:store($t:test-col-DB, "test.xml", $t:XML),
    file:mkdir($t:test-col-FS),
    file:sync($t:test-col-DB, $t:test-col-FS, ())
};

declare
    %test:tearDown
function t:tearDown() {
    xmldb:remove($t:test-col-DB),
    file:delete($t:test-col-FS)
};

declare
    %test:assertTrue
function t:test() {
    deep-equal(
        doc($t:test-col-DB || "/test.xml"), 
        doc($t:test-col-FS || "/test.xml")
    )
};

Context (please always complete the following information):

  • OS: macOS 11.1
  • eXist-db version: eXist 5.3.0-SNAPSHOT 35a76f25f0431220de95ff43752232bbf6a59efc 20210104165718
  • Java Version: 1.8.0_275

Additional context

  • How is eXist-db installed? built from source
  • Any custom changes in e.g. conf.xml? Toggled serializer/@indent between default value, yes, and custom value, no

joewiz avatar Jan 12 '21 14:01 joewiz

I've updated the report to add an xqsuite version of the test.

joewiz avatar Jan 12 '21 16:01 joewiz

Here's how eXist's implementation of fn:serialize gets serialization parameters from conf.xml: https://github.com/eXist-db/exist/blob/adebc77a884fea5e0120c25290b9767ef01dce08/exist-core/src/main/java/org/exist/xquery/functions/fn/FunSerialize.java#L94-L106.

At least I think so. Here's SerializerUtils.getSerializationOptions: https://github.com/eXist-db/exist/blob/303ca0db511ee93bac1b50ccede357fde9658e7b/extensions/modules/file/src/main/java/org/exist/xquery/modules/file/SerializeToFile.java#L174-L196. Are the defaults for getSerializationOptions taken from conf.xml, or they defined in ParameterConvention? https://github.com/eXist-db/exist/blob/303ca0db511ee93bac1b50ccede357fde9658e7b/extensions/modules/file/src/main/java/org/exist/xquery/modules/file/SerializeToFile.java#L174-L196.

(I've also spotted hardcoded serialization parameters in file:serialize, though this function also allows you to override the hardcoded values: https://github.com/eXist-db/exist/blob/303ca0db511ee93bac1b50ccede357fde9658e7b/extensions/modules/file/src/main/java/org/exist/xquery/modules/file/SerializeToFile.java#L174-L196. It's puzzling why file:serialize needs to hardcode parameters at all. Shouldn't it just read from conf.xml?)

joewiz avatar Jan 14 '21 17:01 joewiz

refs #4081

line-o avatar Dec 06 '21 19:12 line-o

@line-o I believe this was fixed by your #4254, wasn't it?

joewiz avatar Jul 18 '22 03:07 joewiz

@joewiz I just tested it and it appears that settings in conf.xml do not affect serialization of file:sync.

Digging a little deeper in the implementation of fn:serialize the settings in conf.xml are present in the Serializerpool instances. Borrowing one of them would do the trick.

https://github.com/eXist-db/exist/blob/adebc77a884fea5e0120c25290b9767ef01dce08/exist-core/src/main/java/org/exist/util/serializer/XQuerySerializer.java#L85-L103

line-o avatar Jul 21 '22 10:07 line-o

This change is also considered as a breaking by @adamretter

line-o avatar Jul 21 '22 10:07 line-o

@line-o Ah, I see! Yes, with your PR, we can set serialization parameters, but file:sync just doesn't yet respect defaults in conf.xml. I can see why a new PR causing file:sync to respect conf.xml defaults could break existing code that didn't specify serialization params and would probably need to be in a major release. Thanks!

joewiz avatar Jul 21 '22 16:07 joewiz

Okay, so if I understand correctly. This should respect the conf.xml serializer settings. I also think @line-o mentioned that he can do that relatively easily. I also respect that doing so might be a breaking change. So if we can do that, let's schedule this for eXist-db 7.0.0. Does that suit everyone?

adamretter avatar Jul 21 '22 20:07 adamretter

@adamretter Yes, I just checked myself and performed the test in the report above with a build of eXist develop (eXist 6.1.0-SNAPSHOT 5e23335288da3f146981e335197efce51de5664e 20220719105055) whose conf.xml's serializer/@indent is set to "no". The file was synced to disk with indentation, i.e., as follows:

<?xml version="1.0" encoding="UTF-8"?>
<foo>
    <bar/>
</foo>

This indicates that file:sync does not currently respect conf.xml.

I agree that a fix would count as a breaking change, so the issue is correctly labeled as 7.0.0.

joewiz avatar Jul 21 '22 20:07 joewiz