bulk-data icon indicating copy to clipboard operation
bulk-data copied to clipboard

Rendering/display issues using fedregister.xsl.

Open rhdunn opened this issue 4 years ago • 22 comments

Hi,

I am using the federalregister.xsl stylesheet at https://www.govinfo.gov/bulkdata/FR/resources to render federal register documents (e.g. https://www.federalregister.gov/documents/2019/11/01/2019-23800/changes-to-applicability-thresholds-for-regulatory-capital-and-liquidity-requirements). This has various rendering issues compared to the HTML and PDF documents. I will detail the issues below as I investigate them.

Kind regards, Reece

rhdunn avatar Nov 07 '19 10:11 rhdunn

When rendering the CHED elements (e.g. the maintemp1 template on lines 740-749), the stylesheet is using <xsl:value-of select="."/>. However, the XML (e.g. https://www.federalregister.gov/documents/full_text/xml/2019/11/01/2019-23800.xml, line 2345-2347) can contain:

<CHED H="1">Average<LI>unweighted</LI>
  <LI>amount</LI>
</CHED>

This gets rendered incorrectly as:

<th id="GPOHEADERS" class="CHED">Averageunweighted
  amount
</th>

The <xsl:value-of select="."/> should be replaced with <xsl:apply-templates/> on lines 746, and 755 (the maintemp1 and h2 named templates). This then generates the HTML:

<th id="GPOHEADERS" class="CHED">Average<span class="LI CHED-LI">unweighted</span>
  <span class="LI CHED-LI">amount</span>
</th>

This then needs the following CSS to display correctly:

.CHED-LI {display:block;}

rhdunn avatar Nov 07 '19 10:11 rhdunn

The table of contents does not get rendered by the stylesheet but is present in the HTML and PDF documents. That is, the FP elements in the EXTRACT element of the table of contents are not rendered. The title (a HD element) is rendered.

This also includes other FP elements that are not email addresses. For example, the "50.40(a) (19 respondents)" text after "Estimated average hours per response:" in https://www.federalregister.gov/documents/2019/11/01/2019-23800/changes-to-applicability-thresholds-for-regulatory-capital-and-liquidity-requirements.

The simplest fix for this is to include the following in the FP element template:

<xsl:if test="not($fpcontent1 = 'Email:')">
  <xsl:call-template name="apply-span"/>
</xsl:if>

This works, and matches the rendering of the HTML page, but does not match the rendering of the PDF document. Specifically, the sub-sections labelled A-Z are not indented relative to the sections with roman numeral numbering. That is, the FP elements with a SOURCE attribute set to FP1-2 are not indented. NOTE: This information is not added to the class in apply-span, so cannot currently have a CSS indent applied to those elements.

rhdunn avatar Nov 07 '19 14:11 rhdunn

The following FP element template adds the SOURCE attribute to the classes so they can be styled correctly:

  <xsl:template match="FP">
     <xsl:variable name="fpcontent1" select="substring(.,1,6)"/>
	  <xsl:choose>
		  <xsl:when test="$fpcontent1 = 'Email:'">
			  <xsl:text>Email: </xsl:text>
			  <xsl:variable name="fpcontent2" select="substring(.,8)"/>
			  <a>
				  <xsl:attribute name="href">
					  <xsl:text>mailto:</xsl:text>
					  <xsl:value-of select="$fpcontent2"/>
				  </xsl:attribute>
				  <xsl:value-of select="$fpcontent2"/>
			  </a>
		  </xsl:when>
		  <xsl:when test="./@SOURCE">
			  <xsl:variable name="collapseSource" select="./@SOURCE"/>
			  <span>
				  <xsl:attribute name="class">
					  <xsl:value-of select="name()"/>
					  <xsl:text> </xsl:text>
					  <xsl:value-of select="name(parent::*)"/>
					  <xsl:text>-</xsl:text>
					  <xsl:value-of select="name()"/>
					  <xsl:text> </xsl:text>
					  <xsl:value-of select="name(parent::*)"/>
					  <xsl:text>-</xsl:text>
					  <xsl:value-of select="$collapseSource"/>
				  </xsl:attribute>
				  <xsl:apply-templates/>
			  </span>
		  </xsl:when>
		  <xsl:otherwise>
			  <span>
				  <xsl:attribute name="class">
					  <xsl:value-of select="name()"/>
					  <xsl:text> </xsl:text>
					  <xsl:value-of select="name(parent::*)"/>
					  <xsl:text>-</xsl:text>
					  <xsl:value-of select="name()"/>
				  </xsl:attribute>
				  <xsl:apply-templates/>
			  </span>
		  </xsl:otherwise>
	  </xsl:choose>
  </xsl:template>

This results in the FP EXTRACT-FP EXTRACT-FP-2 class for level 1 ToC elements, and FP EXTRACT-FP EXTRACT-FP1-2 for the level 2 elements.

The A-Z elements can then be indented using the following CSS:

.EXTRACT-FP1-2 {margin-left:20pt;}

rhdunn avatar Nov 07 '19 14:11 rhdunn

The .SUPLINF-HD3 class CSS element is missing a display:block; style. This likely applies to other -HD3 based classes.

rhdunn avatar Nov 07 '19 16:11 rhdunn

Thanks for letting us know about this, @rhdunn. We'll need to look into this to see that those proposed changes won't have a negative impact elsewhere.

jonquandt avatar Nov 07 '19 18:11 jonquandt

The GPOHEADERS and GPOH2HEADERS ids are generated multiple times, which is invalid -- ids should be unique. Therefore, they should be classes. Specifically, the CSS should be:

.GPOHEADERS {font-weight:bold;font-size:9pt;text-align:center;border-left-style:solid;border-right-style:solid;border-width:1px;border-bottom-style:solid;border-top-style:solid;border-width:1px;border-color:black;}
.GPOH2HEADERS {font-weight:bold;font-size:9pt;text-align:center;border-left-style:solid;border-right-style:solid;border-width:1px;border-bottom-style:solid;border-top-style:solid;border-width:1px;border-color:black;}

while the maintemp1 template should be:

  <xsl:template name="maintemp1">
     <xsl:for-each select="CHED">
        <th>
           <xsl:attribute name="class">
              <xsl:text>GPOHEADERS </xsl:text>
              <xsl:value-of select="name()"/>      
           </xsl:attribute>         
           <xsl:apply-templates/>
        </th>
     </xsl:for-each>
  </xsl:template>   

and the h2 template should be:

  <xsl:template name="h2">
     <xsl:for-each select="CHED[@H=2]">   
        <th class="GPOH2HEADERS"><xsl:apply-templates/></th>
     </xsl:for-each>
  </xsl:template>   

UPDATE 1: The other id="GPOHEADERS" attributes should then be class="GPOHEADERS".

rhdunn avatar Nov 11 '19 17:11 rhdunn

The headers in the GPO tables do not have leftmost/rightmost borders in both the PDF and HTML versions of the rules. This can be achieved (with the change from id to class) using the following CSS:

.GPOHEADERS:first-child {border-left:none;}
.GPOHEADERS:last-child, .GPOH2HEADERS:last-child {border-right:none;}

The table body borders also don't match the PDF or HTML versions. I haven't investigated this yet.

rhdunn avatar Nov 12 '19 11:11 rhdunn

The GPOTABLE class should have the display:table; style instead of the display:block; style. This is preventing the width:100% style from having an effect.

rhdunn avatar Nov 12 '19 11:11 rhdunn

For tables like "Table IV—Timeline for Initial Categorizations and Reporting Under the Final Rule" in https://www.federalregister.gov/d/2019-23662, some of the columns should span 2 columns, but are only spanning 1 column.

The fix for this is to add the following to the MyENT template after the class attribute:

<xsl:if test="./@A=01"><xsl:attribute name="colspan">2</xsl:attribute></xsl:if>

Additionally, the ROW template needs to be adjusted so that the NumOfENT variable is changed to:

<xsl:variable name="NumOfENT" select="count(child::ENT) + count(child::ENT[@A=01])"/>

If the A attribute can have a value other than 01, and indicates how many additional columns to span, then the MyENT template should have:

<xsl:if test="./@A"><xsl:attribute name="colspan"><xsl:value-of select="./@A + 1"/></xsl:attribute></xsl:if>

and the NumOfENT variable should be:

<xsl:variable name="NumOfENT" select="count(child::ENT) + sum(child::ENT/@A])"/>

rhdunn avatar Nov 12 '19 13:11 rhdunn

I've checked another document and it has the A attribute set to L01, so the following will be needed instead:

<xsl:variable name="NumOfENT" select="count(child::ENT) + count(child::ENT[@A=('01', 'L01')])"/>

and

<xsl:if test="./@A=('01', 'L01')"><xsl:attribute name="colspan">2</xsl:attribute></xsl:if>

rhdunn avatar Nov 12 '19 14:11 rhdunn

The table cells in the HTML and PDF documents do not use hashed borders. Instead, they have solid black borders down the middle and at the bottom. This can be achieved using the following CSS:

.ENT {border-left-style:solid;border-right-style:solid;border-top-style:none;border-bottom-style:none;}
.ENT:first-child {border-left:none;}
.ENT:last-child {border-right:none;}
.ROW:last-child > .ENT {border-bottom-style:solid;}

Additional borders appear to be governed by the RUL attribute on the ROW element. I don't currently have styles for these, so they would need to be provided before using this change.

UPDATE 1: The tables that have TNOTE elements will not display the bottom row border with the styles above. They need the following additional CSS:

tr:not(.ROW) > .TNOTE {border-top-style:solid;border-width:1px;padding-top:1em;}
tr:not(.ROW) + tr:not(.ROW) > .TNOTE {border-top-style:none;padding-top:3pt;}

The second style is for tables that have multiple TNOTE elements.

UPDATE 2: The inserted MyENT cells that pad the remaining columns need to make use of the ENT class so that their borders are correctly styled. This requires making it a class (so the id CSS does not take precendence over the ENT class):

   <xsl:attribute-set name="td-list">
      <xsl:attribute name="class">MyENT ENT</xsl:attribute>
   </xsl:attribute-set>

with the #MyENT CSS rule renamed to .MyENT.

rhdunn avatar Nov 12 '19 14:11 rhdunn

The rules I have currently worked out are:

      <xsl:attribute name="class">
		 <xsl:choose>
			 <xsl:when test="./@RUL='rn,s'">ROW-RUL-NSBAR </xsl:when>
			 <xsl:when test="./@RUL='n,s'">ROW-RUL-NSBAR </xsl:when>
			 <xsl:when test="./@RUL='s'">ROW-RUL-SBAR </xsl:when>
		 </xsl:choose>
         <xsl:value-of select="name()"/>
      </xsl:attribute>

with the corresponding CSS:

.ROW.ROW-RUL-NSBAR > .ENT, .ROW.ROW-RUL-SBAR > .ENT {border-bottom-style:solid;}
.ROW.ROW-RUL-NSBAR > .ENT:first-child {border-bottom-style:none;}

rhdunn avatar Nov 12 '19 15:11 rhdunn

The EXPSTB attribute on a ROW element looks like it is applying to the colspan logic for an ENT element. Therefore, the NumOfENT variable in the ROW template should be calculated as:

<xsl:variable name="expstb" select="(./@EXPSTB, '0')[1]"/>
<xsl:variable name="NumOfENT" select="count(child::ENT) + count(child::ENT[@A=('01', 'L01')]) + $expstb"/>

and the ENT element td/@colspan attribute as:

<xsl:choose>
   <xsl:when test="../@EXPSTB and position()=1"><xsl:attribute name="colspan"><xsl:value-of select="../@EXPSTB + 1"/></xsl:attribute></xsl:when>
   <xsl:when test="./@A=('01', 'L01')"><xsl:attribute name="colspan">2</xsl:attribute></xsl:when>
</xsl:choose>

rhdunn avatar Nov 12 '19 16:11 rhdunn

With the above changes, Table 11 in https://www.federalregister.gov/d/2019-21250 is rendering almost correctly. The only issue is that on the HTML page the "Annual hours" and "Wage rate" columns don't have left/right borders between them (i.e. either side of the column with the '×' characters).

rhdunn avatar Nov 12 '19 16:11 rhdunn

@rhdunn - thanks for the detailed feedback.

We uploaded a new version of the fedregister.xsl stylesheet that addresses the issues that you raised on Friday:

Looking at and reviewing the updates from yesterday and today:

  • [ ] multiple GPOHEADERS and GPOH2HEADERS ids - some id="GPOHEADERS" attributes should be class="GPOHEADERS"
  • [ ] add left/right borders in PDF/HTML versions -- css change
  • [ ] GPOTABLE - should be display:table instead of display:block
  • [ ] Some table spanning issues - Table IV - Timeline for Initial Categorizations....
  • [ ] different values of A -- check this and previous comment table cells shouldn't have hashed borders. - needs further analysis, may impact rul on row element.
  • [ ] ROV-RUL CSS updates

jonquandt avatar Nov 12 '19 19:11 jonquandt

Thanks for the update. I'll take a look tomorrow.

rhdunn avatar Nov 12 '19 19:11 rhdunn

The update looks good. Thanks.

rhdunn avatar Nov 13 '19 11:11 rhdunn

Images and formulas (GID and MATH elements) are not displayed correctly, despite the images for the FR documents being available on the HTML pages. They can be rendered by using the following:

  <xsl:template match="GPH/GID">
     <img class="GPH-GID" src="https://s3.amazonaws.com/images.federalregister.gov/{.}/original.png" height="{concat(../@DEEP, 'px')}"/>
  </xsl:template>
  
  <xsl:template match="MATH/MID">
     <img class="MATH-MID" src="https://s3.amazonaws.com/images.federalregister.gov/{.}/original.png" height="{concat(../@DEEP, 'px')}"/>
  </xsl:template>

instead of the existing "Please see PDF for image/formula" messages.

See https://www.federalregister.gov/documents/2017/01/05/2016-30004/energy-conservation-program-test-procedures-for-central-air-conditioners-and-heat-pumps for an example with both images and formulae.

NOTE: This also applies to the CFR documents, but I don't know where those images are located, or if they are available online.

rhdunn avatar Nov 13 '19 15:11 rhdunn

The rendering of the subscript/superscript elements in e.g. the Q and E variables on the https://www.federalregister.gov/documents/2017/01/05/2016-30004/energy-conservation-program-test-procedures-for-central-air-conditioners-and-heat-pumps#page-1560 document are not rendered as such in the stylesheet, but are in the HTML.

Update 1: It looks like the following CSS:

.E-52 {font-size:6pt;vertical-align:sub;}
.APP {margin-top:12pt;margin-bottom:0pt;font-weight:bolder;font-size:12pt;display:block;width:100%;text-align:center;}
.SU, .E-51, .FTREF {font-size:6pt;vertical-align:top;}
.URL {font-style:italic;}

needs to be modified to become:

.E-52, .E-54 {font-size:6pt;vertical-align:sub;}
.APP {margin-top:12pt;margin-bottom:0pt;font-weight:bolder;font-size:12pt;display:block;width:100%;text-align:center;}
.SU, .E-51, .E-53, .FTREF {font-size:6pt;vertical-align:top;}
.URL, .E-53, .E-54 {font-style:italic;}

That is, E-54 looks like an italic version of E-52 (subscript text), and E-53 looks like an italic version of E-51 (superscript text).

rhdunn avatar Nov 13 '19 16:11 rhdunn

Those are all the issues I am aware of, although I haven't done a complete review of the XSLT rendering compared to the PDF and HTML output. As such, I don't expect to add any more rendering issues here.

rhdunn avatar Nov 14 '19 17:11 rhdunn

In the latest update, the class="GPOHEADERS" change in the maintemp1 template (line 764) introduced a bug -- the class attribute is duplicated. The template should be:

  <xsl:template name="maintemp1">
     <xsl:for-each select="CHED">
        <th>
           <xsl:attribute name="class">
              <xsl:text>GPOHEADERS </xsl:text>
              <xsl:value-of select="name()"/>      
           </xsl:attribute>
           <xsl:apply-templates/>
        </th>
     </xsl:for-each>
  </xsl:template>   

The other changes don't have that issue.

rhdunn avatar Dec 02 '19 10:12 rhdunn

Thank you for the feedback. We’ll take a look and adjust.

jonquandt avatar Dec 02 '19 12:12 jonquandt