TTML text parsing issues with new lines
Hi.
I'm trying to parse the following ttml snippet:
<?xml version="1.0" encoding="UTF-8"?><tt xmlns:smpte="http://www.smpte-ra.org/schemas/2052-1/2010/smpte-tt" xmlns="http://www.w3.org/ns/ttml" xmlns:ttm="http://www.w3.org/ns/ttml#metadata" xmlns:tts="http://www.w3.org/ns/ttml#styling" xml:space="default" xml:lang="eng"><head>
<metadata>
<ttm:title/>
</metadata>
<styling>
<style xml:id="style.center.outline" xmlns:tts="http://www.w3.org/ns/ttml#style" tts:fontFamily="Arial" tts:fontSize="100%" tts:fontStyle="normal" tts:fontWeight="normal" tts:backgroundColor="transparent" tts:color="white" tts:textOutline="black 2px" tts:textAlign="center"/>
</styling>
<layout>
<region xml:id="r0" tts:displayAlign="after" tts:origin="10% 75%" tts:extent="80% 20%"/>
</layout>
</head><body>
<div>
<p style="style.center.outline" begin="00:22:31.000" region="r0" xml:id="p264" end="00:22:33.720" ><span tts:direction="ltr">Got you!<br/>Steady on.</span></p>
</div></body></tt>
It seems that the subtitle text is parsed without a new line. The text is unmarshalled as xml chardata:
type TTMLInItem struct {
Text string `xml:",chardata"`
...
}
Which results with the following string: "Got you!Steady on."
ttml.go has the following comment in the code:
// New line decoded as a line break. This can happen if there's a "br" tag within the text since
// since the go xml unmarshaler will unmarshal a "br" tag as a line break if the field has the
// chardata xml tag.
But it doesn't really seem the go xml unmarshaler converts the br tag into a new line. Perhaps this is something which used to be true in old go versions? (I'm using Go 1.18.5
Problem is that this lib apparently doesn't handle properly <br/> inside <span> tags.
I'm welcoming PRs.
Cheers
I created a PR: https://github.com/asticode/go-astisub/pull/106, please take a look. Thank you!