api icon indicating copy to clipboard operation
api copied to clipboard

Question About Federal Register XML Structure

Open aelfric opened this issue 4 years ago • 3 comments

I'm a bit confused about the structure of CFR amendments presented in the Federal Register XML. It seems like the majority of amendments are structured like this.

<REGTEXT TITLE="..." PART="...">
<AMDPAR>Description of amendment</AMDPAR>
<SECTION>
...content of amendment
</SECTION>
</REGTEXT>

But I'm finding some others where SECTION and REGTEXT are sibling elements instead of parent and child. For example in FR-2019-09-30, there is a snippet like this:


<SECTION>
--
 <SECTNO>§ 325.3 </SECTNO>
 <SUBJECT> [Removed and Reserved] </SUBJECT>
 </SECTION>
 <REGTEXT PART="325" TITLE="49">
 <AMDPAR>2. Remove and reserve § 325.3.</AMDPAR>
 </REGTEXT>

Is there supposed to be some semantic difference between these two markup structures?

aelfric avatar Nov 13 '19 23:11 aelfric

These are both valid variations of document structure. The REGTEXT tag is a non-print tag to extract the amendatory information necessary to update the CFR. Informational headings such as the one at 325.3 didn’t provide additional information that wasn’t already in the instruction so the editor opted to place the REGTEXT tag below it. It shouldn’t be considered as part of a hierarchy.

llaplant avatar Nov 14 '19 15:11 llaplant

Okay, so if we are trying to extract the update instructions, is it safe to search for the REGTEXT and AMDPAR elements? Or can there be edit instructions that the editor opts not to wrap in one of those elements?

aelfric avatar Nov 14 '19 16:11 aelfric

If you search for REGTEXT and AMDPAR tags, you should retrieve all necessary information for instructions. The instruction should have a REGTEXT tag wrapped around it. If for some reason it was missing, then the AMDPAR tag query should pick it up.

llaplant avatar Nov 14 '19 18:11 llaplant