pdf-issues icon indicating copy to clipboard operation
pdf-issues copied to clipboard

What does "enclosed" mean for natural language *Lang*?

Open petervwyatt opened this issue 3 months ago • 9 comments

ISO 32000-2:2020, 14.9.2.1 Natural Language Specification has this sentence:

The language specified by a Lang entry shall apply to any content within content streams enclosed or referenced by the respective structure elements, to any ActualText, Alt, or E properties of the respective structure elements.

If an SE (Table 355) has a Lang entry, does that apply to the ActualText, Alt, or E present in the same dictionary, or only those "enclosed"? (meaning children SEs only)

If an MCS (BDC operator) with a /Span tag has a property dictionary with a Lang entry, does that apply to the ActualText, Alt, or E present in the same property dictionary, or only those "enclosed" (i.e. within the BDC/EMC operators)?

I'm trying to more precisely understand the "enclosed" word in the above sentence.

petervwyatt avatar Sep 28 '25 00:09 petervwyatt

IIRC the Lang applies to both the element itself AND to its children.

DuffJohnson avatar Sep 28 '25 10:09 DuffJohnson

I agree with Duff’s understanding. The language would apply to any ActualText, Alt or E present on that structure element (and its children).

On Sun, Sep 28, 2025 at 3:46 AM Duff Johnson @.***> wrote:

DuffJohnson left a comment (pdf-association/pdf-issues#629) https://github.com/pdf-association/pdf-issues/issues/629#issuecomment-3342877775

IIRC the Lang applies to both the element itself AND to its children.

— Reply to this email directly, view it on GitHub https://github.com/pdf-association/pdf-issues/issues/629#issuecomment-3342877775, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABT6F2VKSIU2C2RAV5NMSFL3U636RAVCNFSM6AAAAACHV5TRYKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZTGNBSHA3TONZXGU . You are receiving this because you were assigned.Message ID: @.***>

mrbhardy avatar Sep 28 '25 13:09 mrbhardy

Thanks.

I think an "and" is therefore missing from this sentence from the 1st bullet in 14.9.2.1:

The language specified by a Lang entry shall apply to any content within content streams enclosed or referenced by the respective structure elements, $${\color{red}and}$$ to any ActualText, Alt, or E properties of the respective structure elements.

And the second bullet (MCS) needs a similar ending sentence to cover property list entries:

Marked-content sequences that are not in the structure hierarchy (see 14.6, "Marked content"), through a Lang entry in a property list attached to the marked-content sequence with a Span tag. $${\color{red}The \space language \space specified \space also \space applies \space to \space entries \space in \space the \space respective \space property \space list.}$$

petervwyatt avatar Sep 28 '25 23:09 petervwyatt

Fix to the 2nd bullet feels clunky, but it's clear enough IMO.

DuffJohnson avatar Sep 29 '25 02:09 DuffJohnson

I think an "and" is therefore missing from this sentence from the 1st bullet in 14.9.2.1:

The language specified by a Lang entry shall apply to any content within content streams enclosed or referenced by the respective structure elements, and to any ActualText, Alt, or E properties of the respective structure elements.

Imho it is a bit unclear if the Lang applies also to ActualText of child structure elements.

Why does the text say enclosed or referenced at all? Which content in a content stream is referenced by a structure element but not enclosed?

And looking at table 355: the entry for Lang is missing the option to overwrite the language of a text string like Alt or ActualText with an language escape sequence (7.9.2.2.2):

(Optional; PDF 1.4) A language identifier specifying the natural language for all text in the structure element except where overridden by language specifications for nested structure elements or marked-content (see 14.9.2, "Natural language specification").

u-fischer avatar Sep 29 '25 08:09 u-fischer

And looking at table 355: the entry for Lang is missing the option to overwrite the language of a text string like Alt or ActualText with a language escape sequence (7.9.2.2.2):

(Optional; PDF 1.4) A language identifier specifying the natural language for all text in the structure element except where overridden by language specifications for nested structure elements or marked-content (see 14.9.2, "Natural language specification").

Easiest solution it to strike "or nested structure elements or marked-content" and replace the "(see 14.9.2 ...)" with something stronger, like "as defined in 14.9.2 ..." since duplicating requirements is bad practice / fraught. e.g.:

  • (Optional; PDF 1.4) A language identifier specifying the natural language for all text in the structure element except where overridden by language specifications as defined in 14.9.2, "Natural language specification".

petervwyatt avatar Oct 01 '25 23:10 petervwyatt

Which content in a content stream is referenced by a structure element but not enclosed?

Many types of content streams are "referenced by" -e.g. Form XObjects via Do operator, pattern cells, Type3 glyph descriptions, ...

petervwyatt avatar Oct 01 '25 23:10 petervwyatt

The language specified by a Lang entry shall apply to any content within content streams enclosed or referenced by the respective structure elements, and to any ActualText, Alt, or E properties of the respective structure elements.

Imho it is a bit unclear if the Lang applies also to ActualText of child structure elements.

Maybe the last part of the sentence should be something like "..., and to any ActualText, Alt, or E properties of the respective structure elements or child structure elements."???

petervwyatt avatar Oct 01 '25 23:10 petervwyatt

Assign to Reuse TWG to propose better wording.

petervwyatt avatar Oct 23 '25 20:10 petervwyatt