JavascriptSubtitlesOctopus icon indicating copy to clipboard operation
JavascriptSubtitlesOctopus copied to clipboard

Implement lazy font loading, enable embedded font extraction

Open WeebDataHoarder opened this issue 3 years ago • 10 comments

Contains one commit (7a69b7fcdee7b66f698ec032d4bc42abe86d85fe) from #145.

These changes extract the embedded font, and add the option to enable lazy font loading (where font resources are mapped, but not downloaded until libass tries to use them).

Additionally, fallbackFont option can be used to override the default fallback font that gets used. This way, users can specify their preferred font, including CJK ones (for example Noto Sans CJK .ttc).

Enabling lazy font loading is done via lazyFontLoading set to true, but it defaults to false. This is necessary given some requirements from WASM to not fetch the whole file, specifically:

  • HEAD and Range request support
  • Access-Control-Expose-Headers: Accept-Ranges, Content-Length, Content-Encoding (at least)
  • No Content-Encoding set. If these conditions match, HEAD request will be issued and file will be accessed via Ranges as-needed. Otherwise content will be fetched as a whole file.

Additionally, this PR enables embedded font extraction support on libass side. This is split on a separate commit.

WeebDataHoarder avatar Jun 22 '22 09:06 WeebDataHoarder

How the output directory looks like now:

-rw-r--r-- 1 root root  46K jun 23 10:58 COPYRIGHT
-rw-r--r-- 1 root root 143K jun 23 10:58 default.woff2
-rw-r--r-- 1 root root  75K jun 23 10:58 subtitles-octopus.js
-rw-r--r-- 1 root root 394K jun 23 10:51 subtitles-octopus-worker.js
-rw-r--r-- 1 root root 4,5M jun 23 10:52 subtitles-octopus-worker-legacy.js
-rwxr-xr-x 1 root root 2,2M jun 23 10:51 subtitles-octopus-worker.wasm

WeebDataHoarder avatar Jun 22 '22 09:06 WeebDataHoarder

Removal of --no-heap-copy is a rebase conflict, will force-push

EDIT: yep, conflict was from rebasing this onto #145, now fixed

WeebDataHoarder avatar Jun 22 '22 10:06 WeebDataHoarder

Additional comment for --embed-file: since 3.1.3 (maybe an update could be worth it) it goes directly in memory, without efficiency loss. This is not an issue as mentioned on https://github.com/libass/JavascriptSubtitlesOctopus/pull/145#issuecomment-1163088861 given the few bytes that are being embedded.

WeebDataHoarder avatar Jun 23 '22 10:06 WeebDataHoarder

Commenting only on the embedded fonts part:

The js side makes a copy of the three handles at initialisation, see here: https://github.com/libass/JavascriptSubtitlesOctopus/blob/128be61fe30ec9039192c6dda91c714ef03051e7/src/pre-worker.js#L113-L115 I would guess those need to be updated when reiniting the handles, but it isn't immediately obvious to me where this happens.

Is the library handle, JSO is using internally, exposed to users somehow? There’s octObj.ass_library, but I'm not sure if JSO-users have access to that. If it is, some users might already be adding memory fonts or setting ass_set_extract_fonts manually. .. that is, assuming oct_add_font etc are even available to JSO users.

TheOneric avatar Jun 29 '22 16:06 TheOneric

As far as I know it's only exposed in the worker, probably for debugging reasons.

There are various spots where ass_track gets re-initialized in post-worker.js, setTrack() for example. ass_renderer currently does not get re-initialized.

Most of the oct_add_font or other stuff, is only available to direct users of the worker.js (not jso module interface).

I can update the code so these values get updated properly, or rid of them elsewhere? Even if desired, you can always inspect/grab these handles via self.octObj.ass_library in worker code / debugger.

WeebDataHoarder avatar Jun 29 '22 20:06 WeebDataHoarder

My line of thought was that if JSO users can get the internal library the change to also clear memory fonts (and reinit all handles) needs to be documented. If no one has access to it, its an internal change and the existing high level description of "allows to change the track" or similar is sufficient for the affected API functions.

There are various spots where ass_track gets re-initialized in post-worker.js, setTrack() for example.

I might have missed something, but grepping through the code it seems like those values are actually only getting assigned to and never used? (If that’s the case: why do they even exist?)

Most of the oct_add_font or other stuff, is only available to direct users of the worker.js (not jso module interface).

Sorry, can you explain howsuch a direct use for a JSO user might look like, so that oct_* are available? I'm trying to understand if there is any more conflict potential and how the scopes here are or are not separated.

Both oct_* and octObj are mentioned in 4.0.0’s release notes and appear to have been added in #64; CC: @TFSThiagoBR98.

TheOneric avatar Jun 29 '22 23:06 TheOneric

You haven't missed something :)

The only way I see to use these APIs are by code ran from the worker side specific, and only relevant to forks, which could already add such API. To be able to use that you need to instantiate the .wasm yourself, or be within the worker scope.

There is a kind of direct way to execute code, and that is via onCustomMessage / custom worker events. Sadly these are placeholders that are not set by default (and have no method to set the runner function from user code). You would still need worker modifications to use this.

Don't know what to tell you about the intent of those methods/functions/variables being exposed, but I don't see any way to use them at the moment without modifications.

I believe this topic should probably be a discussion elsewhere, for now I'll update the code to refresh these variables.

WeebDataHoarder avatar Jun 30 '22 13:06 WeebDataHoarder

Thanks, if there's no way an user can modify the library and renderer states or add memory fonts, then there’s nothing more to worry about regarding embedded fonts. Eventhough, the actually diff ended up quite concise, the interaction with the rest of JSO and the effects aren't, so how about a more verbose commit message similar to this? (also avoiding overly long lines):

Use embedded fonts

ASS subtitles can embedd fonts with a custom encoding in
their [Fonts] section. For historic reasons ass_set_extract_fonts
defaults to disabled, so we need to explicitly opt-in.
There’s currently also an issue, with embedded fonts outliving
their track, which can lead to indefinitely growing memory consumption
if not all memory fonts are cleared on track reinit.

However, ass_clear_fonts can only be safely called if the renderer
also has been released first. At this point it is simpler to just
reinit the whole library-renderer-track triplet and library inits
are not that costly anyway.

JSO never uses (non-embedded) memory fonts itself and does not expose
any way to add them so this does not constitute a user-visible change.
Even oct_add_font cannot be used by consumers of upstream JSO binaries.

Note: the JS pointers of the libass handles are updated in this patch,
      but it appears they are actually never used.

If you can update the commit message and move the embedded font commit before the lazy loading one, I'm going to merge the embedded font commit in ~24h if rcombs or TFSThiagoBR98 don't object.


The question of how to use the interface and dead code warrants a dedicated thread indeed. Only if there would have been some way to add memory fonts there had been direct relevance to the change at hand. (I just noticed createTrackMem also seems to be unused...)

TheOneric avatar Jun 30 '22 15:06 TheOneric

I have reworded the embedded fonts commit, and moved it before lazy font loading.


On the topic of dead code, I have cleaned up a few of those and refactored a few methods to de-duplicate some code. Might open a PR just for dead-code/feature elimination with those changes to start with.

WeebDataHoarder avatar Jun 30 '22 17:06 WeebDataHoarder

Merged embedded font commit.

TheOneric avatar Jul 01 '22 16:07 TheOneric