dita-ot icon indicating copy to clipboard operation
dita-ot copied to clipboard

A <mapref> with a @keys attribute is processed incorrectly

Open chrispy-snps opened this issue 4 years ago • 8 comments

Expected Behavior

If <mapref> elements have the @keys attribute defined:

3815_1

then publishing should yield the same results as if @keys was not defined.

Actual Behavior

The content is published incorrectly in the preprocess2 pipeline, as described below. (preprocess is also affected when force-unique is used - see comments below this description.)

During the mapref preprocessing stage, the presence of the @keys attribute causes a <keydef> element to be created in addition to a <submap> element for the referenced map contents:

3815_2

During the branch-filter preprocessing stage, the profiled "Product" key values are correctly pruned to reflect the <ditavalref>:

3815_3

During the keyref preprocessing stage, the <keydef> element is incorrectly processed to insert a redundant (and unfiltered) <submap> of "Product" values:

3815_4

When preprocess2 is used, because topics are processed after maps, the unfiltered "Product" key values are incorrectly used to resolve the key reference in the topic file, causing the ***ERROR*** value to be used in the output.

(Note that an extra preprocessing-only topic reference to preface.dita is also created, but it does not affect output in this testcase.)

Possible Solution

Somehow the second insertion of the <submap> (due to the <keydef>) must be prevented. I don't know enough about the keyref stage to know what element structures should/should not be created.

Steps to Reproduce

  1. Unarchive the following testcase: ditaot_mapref_keys_bug.zip
  2. Run the pdf2 transformation: dita -i map.ditamap -f pdf2

Environment

  • DITA-OT version: 3.7
  • Operating system and version: Ubuntu 20.04
  • How did you run DITA-OT? dita command
  • Transformation types: pdf2

chrispy-snps avatar Oct 27 '21 17:10 chrispy-snps

Unfortunately, this bug also affects preprocess. When force-unique is set to true, duplicate copies of all topics in the submap are created. For example, if I add two subtopics to preface.ditamap, I get:

$ rm -rf out
$ dita -i map.ditamap -f html5 -o out -Dforce-unique=false
$ ls -1 out/*.html
out/index.html
out/preface.html
out/topic.html
out/preface_subtopicA.html
out/preface_subtopicB.html

$ rm -rf out
$ dita -i map.ditamap -f html5 -o out -Dforce-unique=true
$ ls -1 out/*.html
out/index.html
out/preface.html
out/preface_2.html   <-- duplicate
out/preface_subtopicA.html
out/preface_subtopicA_2.html   <-- duplicate
out/preface_subtopicB.html
out/preface_subtopicB_2.html   <-- duplicate
out/topic.html

This occurs because copy-to sees the duplicate <submap> and uniquifies its contents.

chrispy-snps avatar Jul 06 '22 19:07 chrispy-snps

Also, some related discussion here:

DITA Users - "Does a key reference to a <mapref> have any meaning?"

chrispy-snps avatar Jul 06 '22 19:07 chrispy-snps

The <submap> element is added by this stylesheet "/plugins/org.dita.base/xsl/preprocess/maprefImpl.xsl" in the template:

   <xsl:template match="*[contains(@class, ' map/topicref ')][(@format, @dita-ot:orig-format) = 'ditamap']" priority="10">

basically the stylesheet is applied somehow twice, in two stages, first in the "mapref:" stage, then branch filtering is applied and then the "keyref:" stage calls again the mapref stage possibly because you can use keyref to refer to DITA Maps so the stylesheet gets applied the second time and adds the extra submap element. The task named "keyref" in the DITA OT build files also calls a "mapref" task after it resolves the keys, probably to parse extra keyref's to DITA Maps which have just been resolved.

raducoravu avatar Jul 14 '22 10:07 raducoravu

Looking at the build_preprocess.xml at the keyref target:

	  <target name="keyref" depends="" unless="preprocess.keyref.skip" description="Resolve keyref">
	    <pipeline ...
	    <pipeline message="Resolve mapref in ditamap" taskname="mapref">
	      <xslt basedir="${dita.temp.dir}" reloadstylesheet="${dita.preprocess.reloadstylesheet.mapref}" style="${dita.plugin.org.dita.base.dir}/xsl/preprocess/mapref.xsl" filenameparameter="file-being-processed">
	        <ditafileset format="ditamap" input="true"/>
	        <param name="TRANSTYPE" expression="${transtype}"/>
	        <param name="child-topicref-warning" expression="false"/>
	        <param name="keep-submap-href" expression="false"/>
	        
	        <xmlcatalog refid="dita.catalog"/>
	      </xslt>
	    </pipeline>
	  	<fail/>
	  </target>

it seems to pass this "keep-submap-href" parameter which is not used at all in the maprefImpl.xsl.

In my opinion a proper fix for this would imply:

  1. In the maprefImpl.xsl define that parameter:

     <xsl:param name="keep-submap-href" as="xs:string" select="'true'"/>
    

and filter the initial submaps based on it:

 <xsl:template match="submap[$keep-submap-href='false']" priority="100"/>
  1. In the build_preprocess.xml where the "keyref" target is defined, after calling mapref we should call again the branch filtering module:

     	  <target name="keyref" depends="" unless="preprocess.keyref.skip" description="Resolve keyref">
     	    <pipeline message="Resolve keyref." taskname="keyref"....>
     	    <pipeline message="Resolve mapref in ditamap" taskname="mapref">...>
     	 <pipeline taskname="branch-filter" message="Filter branches">
     	    <module class="org.dita.dost.module.BranchFilterModule"/>
     	  </pipeline>
     	  	<fail/>
     	  </target>
    

but ideally we could call it only on the DITA Maps because it can probably induce some performance problems. And this would fix probably also #3758

raducoravu avatar Jul 14 '22 11:07 raducoravu

Looking at the many things the BranchFilter module does (for example it also renames topicrefs) probably only the "org.dita.dost.module.BranchFilterModule.filterBranches(Element, List<FilterUtils>, QName[][])" part from it would be useful in this case.

raducoravu avatar Jul 14 '22 11:07 raducoravu

Hi in preprocess2, I tested by changing order to run map-keyref before the map-branch-filter, the output looks as expected. image

but I not sure will this causing other impact......

SoonBoonChin avatar Aug 17 '22 06:08 SoonBoonChin

@chrispy-snps about this comment related to HTML5 processing with force-unique=true: https://github.com/dita-ot/dita-ot/issues/3815#issuecomment-1176570272 with this pull request merged in https://github.com/dita-ot/dita-ot/pull/4003 I cannot reproduce the problem with having duplicate HTML files in the output folder.

raducoravu avatar Sep 20 '22 10:09 raducoravu

@SoonBoonChin - thank you for sharing your findings! I am always hesitant to change the default ordering of preprocessing steps for our production processing, but I think it's good information that the cause of the bug is some kind of order dependency.

@raducoravu - interesting! I took a closer look. In the map-keyref figure in the issue description, the questionable submap copy of preface.ditamap (shown in red) has processing-role="resource-only", which causes the duplicate topics. The fix suppresses these duplicates, which is a nice practical benefit until the @keys processing itself can be fixed.

I also confirmed that even with the #3815/#4003 fix, the preprocess2 bug for incorrect key resolution still occurs, as it is an independent bug caused by the duplicate <submap>.

chrispy-snps avatar Sep 20 '22 11:09 chrispy-snps

Hi @chrispy-snps @raducoravu

I did the order changing and it seems works fine. (at least in preprocess2) Just wanted to check if there is other concern on this?

I tried another alternative method, to run branch filter again after resolving keyref, this seems working also.

Radu can u assist on try to raise the simple pull request for the checks and test? ( I seems doesn't have the access to do this)

We wanted have this fix to get it work(at least as workaround), but afraid it might breaks other stuff

SoonBoonChin avatar Oct 05 '22 08:10 SoonBoonChin

@SoonBoonChin you can open a pull request yourself, anybody can open pull requests, you do not need to be a project admin for this: https://www.dita-ot.org/contributing#coding

We are not using the "preprocess2" stage by default with the PDF (CSS based on XSL-FO based) and WebHelp outputs generated from the DITA OT bundled with Oxygen. And the "preprocess" stage no longer has this problem after the fix I made here: https://github.com/dita-ot/dita-ot/pull/4003 So I'm not directly interested in further investigating a possible fix for this.

Also I believe the possible fix where you do keyref processing before branch filtering has the potential to generate side effects, even if all the auto tests will work.

raducoravu avatar Oct 05 '22 08:10 raducoravu