go-astisub icon indicating copy to clipboard operation
go-astisub copied to clipboard

TTML text parsing issues with new lines

Open shlompy opened this issue 2 years ago • 2 comments

Hi.

I'm trying to parse the following ttml snippet:

<?xml version="1.0" encoding="UTF-8"?><tt xmlns:smpte="http://www.smpte-ra.org/schemas/2052-1/2010/smpte-tt" xmlns="http://www.w3.org/ns/ttml" xmlns:ttm="http://www.w3.org/ns/ttml#metadata" xmlns:tts="http://www.w3.org/ns/ttml#styling" xml:space="default" xml:lang="eng"><head>
    <metadata>
      <ttm:title/>
    </metadata>
    <styling>
<style xml:id="style.center.outline" xmlns:tts="http://www.w3.org/ns/ttml#style" tts:fontFamily="Arial" tts:fontSize="100%" tts:fontStyle="normal" tts:fontWeight="normal" tts:backgroundColor="transparent" tts:color="white" tts:textOutline="black 2px" tts:textAlign="center"/>
    </styling>
    <layout>
      <region xml:id="r0" tts:displayAlign="after" tts:origin="10% 75%" tts:extent="80% 20%"/>
    </layout>
  </head><body>
  <div>
  <p style="style.center.outline" begin="00:22:31.000" region="r0" xml:id="p264" end="00:22:33.720" ><span tts:direction="ltr">Got you!<br/>Steady on.</span></p>
  </div></body></tt>


It seems that the subtitle text is parsed without a new line. The text is unmarshalled as xml chardata:

type TTMLInItem struct {
	Text string `xml:",chardata"`
...
}

Which results with the following string: "Got you!Steady on."

ttml.go has the following comment in the code:

// New line decoded as a line break. This can happen if there's a "br" tag within the text since
// since the go xml unmarshaler will unmarshal a "br" tag as a line break if the field has the
// chardata xml tag.

But it doesn't really seem the go xml unmarshaler converts the br tag into a new line. Perhaps this is something which used to be true in old go versions? (I'm using Go 1.18.5

shlompy avatar Jan 24 '23 15:01 shlompy

Problem is that this lib apparently doesn't handle properly <br/> inside <span> tags.

I'm welcoming PRs.

Cheers

asticode avatar Jan 25 '23 13:01 asticode

I created a PR: https://github.com/asticode/go-astisub/pull/106, please take a look. Thank you!

NhanNguyen700 avatar May 31 '24 03:05 NhanNguyen700