gpt-subb icon indicating copy to clipboard operation
gpt-subb copied to clipboard

Only the first line of multiline subtitles are translated.

Open xpufx opened this issue 2 years ago • 6 comments

Input:

12 00:03:06,733 --> 00:03:11,832

  • ¿Está a 2000 metros de altitud?
  • ¡No, doctor!

13 00:03:11,852 --> 00:03:15,871

  • ¡Quizá 1950!
  • ¿Se puede llegar allí con Jeep?

Output:

12 00:03:06,733 --> 00:03:11,832

  • 2000 metre yükseklikte mi?

13 00:03:11,852 --> 00:03:15,871

  • Belki 1950!

xpufx avatar Mar 27 '23 10:03 xpufx

Hello @xpufx

Could you also provide all the parameters you are passing as argument?

SkyaTura avatar Mar 27 '23 11:03 SkyaTura

gpt-subb -k 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 -l tr Il.Giovane.Montalbano.S01E01.La.prima.indagine.di.Montalbano.srt

(I will kill the api key now. no problem)

xpufx avatar Mar 27 '23 11:03 xpufx

@xpufx could you please verify if this also happen with multiple lined messages that DOESN'T have numbers mixed with text?

Also thanks for your collaboration

SkyaTura avatar Mar 28 '23 13:03 SkyaTura

I tried a little snippet from english to turkish. Similar situation. I am attaching them below. (Timestamps are correct for some show but I changed the text to some nonsensical stuff just in case. I tried to keep the format the same just in case there might be nonprintable characters I am not seeing). Added .txt extension for github to allow uploads.

Source. genericsubtitle.srt.txt

Result. genericsubtitle.tr.srt.txt

xpufx avatar Mar 28 '23 22:03 xpufx

@SkyaTura I can confirm that the problem still exists :/

ghost avatar Apr 28 '23 07:04 ghost

Sorry folks, I had no time to check this yet. However, I became more familiarized with the openai api, and I already know what may be going on. In addition to that, I also understand better about tokenization now.

This been said, I'll refactor this project for a better consistent results and more cost efficiency either

SkyaTura avatar Apr 28 '23 13:04 SkyaTura