Speech accents

Open Fogapod opened this issue 2 years ago • 1 comments

Summary

Accent system is used to modify speech before it is sent to chat to simulate speech defects or status effects. Text replacement rules are defined using special format.

Motivation

While it is possible to type any accent manually, it is handy to have some automatic system. Additionally accents can act as limitations like vision, hearing and other impairments.

Custom format should simplify accent creation by focusing on rules.

The result of this should at least have feature parity with Unitystation accents, otherwise it is not worth the effort.

Guide-level explanation

Accents modify player speech in chat. Multiple accents can be applied on top of each other, making message much less comprehensible.

Accents can be acquired in multiple ways: selected accent(s) during character creation, wearing items items (clown mask), status effects (alcohol consumption, low health) and maybe others.

Replacements are found in multiple passes. Each pass inside accent has a name and consists of multiple rules which are combined into a single regex. A rule says what to replace with what tag. Simplest example of rule is: replace hello with Literal("bonjour"). Literal is one of the tags, it replaces original with given string. Note that hello is actually a regex pattern, more complex things can be matched.

Some of the tags are:

Original: does not replace (leaves original match as is)
Literal: puts given string
Any: selects random inner replacement with equal weights
Upper: converts inner result to uppercase
Lower: converts inner result to lowercase
Concat: runs left and right inner tags and adds them together

Some tags take others as an argument. For example, Upper: Upper(Literal("bonjour")) will result in hello being replaced with BONJOUR.

It is possible to define multiple intenisty levels of accent in the same file. You can make accent get progressively worse as intensity goes higher. Intensity can be either randomly assigned or get worse as effect progresses (you get more drunk).

Ron example:

// This accent adds honks at the end of your messages (regex anchor $)
// On intencity 1+ it adds more honks and UPPERCASES EVERYTHING YOU SAY
(
    accent: {
        // `ending` pass. all regexes inside pass are merged. make sure to avoid overlaps
        "ending": (
            rules: {
                // 1 or 2 honks on default intensity of 0
                "$": {"Any": [
                    {"Literal": " HONK!"},
                    {"Literal": " HONK HONK!"},
                ]},
            },
        ),
    },
    intensities: {
        1: Extend({
            // merges with `ending` pass from accent body (intensity 0 implicitly)
            "ending": (
                rules: {
                    // overwrite "$" to be 2 to 3 honks
                    "$": {"Any": [
                        {"Literal": " HONK HONK!"},
                        {"Literal": " HONK HONK HONK!"},
                    ]}),
                },
            ),
            // gets placed at the end as new pass because `main` did not exist previously
            "main": (
                rules: {
                    // uppercase everything you say
                    ".+": {"Upper": {"Original": ()}}),
                },
            ),
        }),
    },
)

Reference-level explanation

General structure

Accent consists of 2 parts:

accent: intensity 0
intensities: a map from level to enum of Extend or Replace, containing intensity definition inside, same as accent

Accent is executed from top to bottom sequentially.

Regex patterns

Every pattern is compiled into regex meaning it has to be valid rust regex syntax. While some features are missing, regex crate provides excellent linear performance.

By default every regex is compiled with (?mi) flags (can be opted out by writing (?-m).

Regexes inside each pass are merged which significantly improves perfomance (~54x improvement for scotsman with 600+ rules) but does not handle overlaps. If you have overlapping regexes, those must be placed into separate passes.

Case mimicking

Messages look much better if you copy original letter case. If user was SCREAMING, you want your replacement to scream as well. If use Capitalized something, you ideally want to preserve that. Best effort case mimicking is enabled for literal. This currently includes:

do nothing if input is full lowercase
if input is all uppercase, convert output to full uppercase
if input and output have same lengths, copy case for each letter

This is currently ASCII only!!

Regex templating

Regex provides a powerful templating feature for free. It allows capturing parts of regex into named or numbered groups and reusing them as parts of replacement. For example, Original is Literal("$0") where $0 expands to entire regex match.

Tag trait

There are multiple default tags but when they are not enough, Tag can be implemented which would automatically allow deserializing implementation name. Implementation of Tag could look like this (not final):

use sayit::{
    Accent,
    Match,
    Tag,
};

// Deserialize is only required with `deserialize` crate feature
#[derive(Clone, Debug, serde::Deserialize)]
// transparent allows using `true` directly instead of `(true)`
#[serde(transparent)]
pub struct StringCase(bool);

// `typetag` is only required with `deserialize` crate feature
#[typetag::deserialize]
impl Tag for StringCase {
    fn generate<'a>(&self, m: &Match<'a>) -> std::borrow::Cow<'a, str> {
        if self.0 {
            m.get_match().to_uppercase()
        } else {
            m.get_match().to_lowercase()
        }.into()
    }
}

// construct accent that will uppercase all instances of "a" and lowercase all "b"
let accent = ron::from_str::<Accent>(
    r#"
(
    accent: {
        "main": (
            rules: {
                "a": {"StringCase": true},
                "b": {"StringCase": false},
            }
        ),
    }
)
"#,
)
.expect("accent did not parse");

assert_eq!(accent.say_it("abab ABAB Hello", 0), "AbAb AbAb Hello");

Intensities

Default intensity is 0 and it is always present in accent. Higher intensities can be declared in optional intensities top level struct. Key is intensity. This map is sparse meaning you can skip levels. The highest possible level is selected.

There is 2 ways to define intensity:

Replace starts from scratch and only has its own set of rules. Extend recursively looks at lower intensities up to 0 and merges them together. If pattern conflicts with existing pattern on lower level it is replaced (its relative position remains the same). All new rules are added at the end of merged words and patterns arrays.

Drawbacks

Accent system as a whole

Some people might find accents annoying. Impacts server performance by ~0.0001%

Tag system perfomance

This is mostly mitigated by merging regexes.

~~List of regular expressions will never be as performant as static~~ ~~replacements. There are some potential optimizations like merging~~ ~~patterns without any regex escape codes or some smart way to run~~ ~~replacements in parallel, but list of static strings can be~~ ~~replaced efficiently.~~

~~Other aspect of tag system is layers which add some overhead unless~~ ~~compiled down but even then some tags might need nesting.~~

~~While these can be partially mitigated, it would increase code~~ ~~complexity significantly.~~

Memory footprint

Compiled regexes are pretty large. Scotsman accent alone in CLI tool on release build shows up as ~130mb. Although not sure i measured it correctly.

Executable size / extra dependencies

Library was made as minimal as possible with 37 dependencies and ~1.1M .rlib size. Further size decrease is possible by disabling regex optimizations.

~~Due to complexity of deserializable trait and dependency on regex there~~ ~~are ~40 total dependencies in current WIP implementation and .rlib~~ ~~release file is ~1.2M (unsure if it's correct way to measure binary~~ ~~size).~~

Regex rule overlaps

This has been solved by regex passes.

~~It is harder (or maybe even impossible) to detect overlaps between regex~~ ~~patterns as opposed to static strings. Users must be careful to not~~ ~~overwrite other rules.~~

Patterns overwrite words

This has been solved by regex passes.

~~This problem is essentially the same as previous one. Rules are executed~~ ~~top to bottom, words first and then patterns. It makes it hard or in~~ ~~some cases even impossible to adequately combine words and single/double~~ ~~character replacements.~~

Extreme verbosity

Even simplest tags like {"Literal": "..."} are extremely verbose. Ideally i would want to deserialize String -> Literal, Vec<Box<dyn Tag>> -> Any, Map<u64, Box<dyn Tag>> -> Weights but i did not find a way to do this yet. Not sure if it is possible.

Additionally there is a lot of nesting. I tried my best to keep accent as flat as possible but there is simply too much going on.

Rationale and alternatives

Accent system as a whole

Alternative to not having accents is typing everything by hand all the time and hoping players roleplay status effects.

Tag system

As for tag system, it potentially allows expressing very complex patterns including arbitrary code via Custom tag impls that could in theory even make http request or run LLM (lets not do that).

While being powerful and extensible, tag syntax remains readable.

Regex patterns

While being slower than static strings, regex is a powerful tool that can simplify many accents.

Prior art

Other games

SS13

As far as I know, byond stations usually use json files with rules.

This works but has limitations.

Unitystation

Unitystation uses some proprietary Unity yaml asset format which they use to define lists of replacements - words and patterns. After all replacements custom code optionally runs.

Accent code: https://github.com/unitystation/unitystation/blob/be67b387b503f57c540b3311028ca4bf965dbfb0/UnityProject/Assets/Scripts/ScriptableObjects/SpeechModifier.cs Folder with accents (see .asset files): https://github.com/unitystation/unitystation/tree/develop/UnityProject/Assets/ScriptableObjects/Speech

This is same system as byond and it has limitations.

SS14

Space Station 14 does not have any format. They define all accents with pure c#.

Spanish accent: https://github.com/space-wizards/space-station-14/blob/effcc5d8277cd28f9739359e50fc268ada8f4ea6/Content.Server/Speech/EntitySystems/SpanishAccentSystem.cs#L5

This is simplest to implement but results in repetitive code and is harder to read. This code is also hard to keep uniform across different accents.

There is a helper method that handles world replacements with localization and case mimicking: https://github.com/space-wizards/space-station-14/blob/a0d159bac69169434a38500b386476c7affccf3d/Content.Server/Speech/EntitySystems/ReplacementAccentSystem.cs

Similar behaviour might be possible with custom Tag implementation that looks up localized string at creation time and seeds internal Literal with it.

Unresolved questions

~~Tag trait!!!~~
~~How to integrate this with SSNT~~
~~Custom trait options/message passing/generic over settings - likely impossible~~
Do benefits of tag system overweight the complexity that comes with it
~~Minimal set of replacement tags~~
~~Maybe a way to completely redefine accent / extend it like default~~ ~~Unitystation behaviour where custom code runs after all rules~~ this is likely covered by passes/custom Tag implementations
~~How complex should be string case mimicking~~
The optimal way to do repetitions
Reusing data: you might want to add 2 items to array of 1000 words in next intensity level or use said array between multiple rules
~~Do tags need to have access to some state/context~~ not now

Future possibilities

Accent system could possibly be reused for speech jumbling system: turning speech into junk for non-speakers. One (bad) example might be robot communications visible as ones and zeros for humans.

Nov 02 '23 14:11 Fogapod

I am currently working on proof of concept for this tag system at https://git.based.computer/fogapod/sayit

Nov 02 '23 14:11 Fogapod