2017 icon indicating copy to clipboard operation
2017 copied to clipboard

Vanish Like: The Novel

Open cpressey opened this issue 6 years ago • 6 comments

This novel generator is in the vein of "mathematical exploration", rather like perm-comb-finder, which resulted in 3×C(21,3)+2×C(215,2)=50000: The Novel in 2014. So this will begin with an expository section before getting to the code and result.

Exposition

Apparently — I should note that my understanding here comes from a past age, from books I read before the Internet was ubiquitous, and I have not researched any of this using modern resources to confirm or deny or modify any of it — apparently the word ABRACADABRA is supposed to mean "vanish like the word", and in a magic charm, you'd write it like this:

ABRACADABRA
ABRACADABR
ABRACADAB
ABRACADA
ABRACAD
ABRACA
ABRAC
ABRA
ABR
AB
A

And whatever you're trying to banish would, by the power of sympathetic magic, also vanish, just like the word ABRACADABRA vanishes there, just like it says it will.

Now, this sort of vanishing trick seems amenable to computer generation. Say I wanted to start with a paragraph, and remove words one by one until they're all gone, and assemble a "vanishing novel" this way.

w.r.t. NaNoGenMo, the significant part here is not figuring out how to implement this; it's quite straightforward. The significant part is determining how many words the first paragraph should have so that the total number of words in the novel is as close to 50,000 as possible, without going under.

Now, ABRACADABRA has 11 letters, so the total number of letters in the charm shown above is 1+2+3+4+5+6+7+8+9+10+11 = 66.

OK, so we already know we have a computer available (I mean, this is NaNoGenMo, right) and addition is cheap and 50,000 is not that big, so we could write a brute-force solver like

110 T=0
120 I=1
130 T=T+I
140 IF T>=50000 THEN GOTO 170
150 I=I+1
160 GOTO 130
170 PRINT T

But that seems inelegant. Surely there's a closed form for "sum of integers between 1 and n", and that would be more elegant.

Indeed there is. Folklore has it that Gauss came up with it in class one day when the teacher assigned summing the numbers from 1 to 100 manually as a tedious punishment.

But I don't know if I should believe that, because folklore also has it that Gauss's IQ was 220, but I figured out the closed form for this in math class one day too (but because I was bored, not because I was looking for a way to save myself labour) and my IQ is certainly nowhere near 220. Heck, 220 doesn't even sound like a real IQ. I should probably look some of this stuff up.

But my point is, it's not that hard to figure out, I guess? If you feel so inclined, you could try to tackle it yourself before reading further.

The first observation is that when you are adding successive integers, you can visualize each integer n as a row of n blocks, and the sum as a stacking of these rows. Indeed, you can just look at that ABRACADABRA charm up there to see what I mean.

Now, you can plainly see, such a stacking fits in a square, in fact takes up roughly half the square, so the closed form must be roughly n²/2.

But - an area of exactly n²/2 would have a "smooth" diagonal from the upper-right corner to the bottom-left corner of that square. But the letters along the diagonal form a "jaggy" line. Each row overhangs the diagonal by half a character (sliced diagonally). And there are n rows. So we must add n/2 to our formula. Then we get

n²/2 + n/2

which can be rewritten as

(n² + n)/2

which can be rewritten as

(n)(n + 1)/2

which is often how this closed form is presented.

Verily! Now that we have over-explained something you could've just gone and looked up, back to applying this to NaNoGenMo. We want to solve

(n)(n + 1)/2 = 50000

We can multiply both sides by 2 to get

(n)(n + 1) = 100000

And then multiply the two terms on the LHS to get

n² + n = 100000

And then subtract 100000 from both sides to get

n² + n - 100000 = 0

which has a form which may look familiar; it's a quadratic equation. And we have a formula for solving one of those, for which I have no anecdote except that they tried to make us memorize it in that same math class but, while parts have stuck in my memory, like the ±, those parts had become hopelessly scrambled over time, so this one I did have to look up. Here it is:

n = (-b ± sqrt(b² - 4ac)) / 2a

In our case, a = 1, b = 1, and c = -100000. Plugging those in, we get

n = (-1 ± sqrt(1² - 4×1×-100000)) / 2×1

Simplifying

n = (-1 ± sqrt(400001)) / 2

Taking the square root, we see the two solutions are

n = (-1 + 632.456) / 2 n = (-1 - 632.456) / 2

That is,

n = {315.728, -316.728}

Now, while exploring what it might mean for a piece of text to have a negative number of words is a fascinating possibility for a conceptual exercise, I feel it might be out of scope for this entry, and we will discard the negative value. Similarly, exploring what it might mean for a piece of text to have a fractional number of words feels like it might be similarly out of scope, so we round up, and we're left with

n = 316

We plug it back in to see that

(316² + 316) / 2 = 50086

is indeed quite close to 50,000, and

(315² + 315) / 2 = 49770

shows that it is as close as you can get without going under. Yay!

So now, we write 316 words and that's a paragraph, and the computer can spit out 315 more paragraphs, each with one less word, and we put them all together, and we have a generated novel, fit for a NaNoGenMo green "completed" label.

Code

This is in Python 2.7. It takes the initial paragraph as input, as a text file, with one sentence per line.

import random
import sys

class Sentence(object):
    def __init__(self, s):
        s = s.strip()
        assert s.endswith(('.', '?'))
        self.ender = s[-1]
        self.words = s[0:-1].split(' ')

    def __str__(self):
        words = list(self.words)
        words[0] = words[0][0].upper() + words[0][1:]
        while words[-1].endswith(('.', ',', '?')):
            words[-1] = words[-1][0:-1]
        return ' '.join(words) + self.ender

    def reduce(self):
        del self.words[random.randint(0, len(self.words) - 1)]

if __name__ == '__main__':
    sentences = [Sentence(line) for line in sys.stdin]
    initial_word_count = sum([len(sentence.words) for sentence in sentences])
    assert initial_word_count >= 316, initial_word_count

    while sentences:
        print '  '.join([str(s) for s in sentences]) + '\n'
        random.choice(sentences).reduce()
        sentences = [s for s in sentences if s.words]

Result

Well, here is the thing, right? After writing this code and doing this writeup, I thought it would be easy to write a 316-word paragraph on the theme of vanishing to use as input for this generator. But I'm struggling with it. It's really a good thing I didn't try to do NaNoWriMo this year -- if I can't even get 316 words together, imagine how hard I'd fail with 50K.

Update: finished novel is here.

cpressey avatar Nov 16 '17 14:11 cpressey

Now, while exploring what it might mean for a piece of text to have a negative number of words is a fascinating possibility for a conceptual exercise, I feel it might be out of scope for this entry, and we will discard the negative value.

Fascinating indeed; do -50,000 words a negative novel make?

hugovk avatar Nov 16 '17 15:11 hugovk

AL: I think this issue should have a cyan "preview" label.

BETTY: I don't. He hasn't shown what the novel is going to look like.

AL: So if you show none of your code, only a few words of its output, you get a cyan "preview" label -- but if you show all of your code and none of its output, you don't?

BETTY: Don't be naïve, Al. No one cares about the code.

cpressey avatar Nov 17 '17 11:11 cpressey

Vanish Like: The Novel Vanish Like: The Vanish Like: Vanish

hugovk avatar Nov 17 '17 11:11 hugovk

Couldn't you find exactly 316 words of public domain material about vanishment or disappearance?

On Fri, Nov 17, 2017 at 6:50 AM Hugo [email protected] wrote:

Vanish Like: The Novel Vanish Like: The Vanish Like: Vanish

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/NaNoGenMo/2017/issues/107#issuecomment-345223880, or mute the thread https://github.com/notifications/unsubscribe-auth/AAd6GchgxjNdD8JwEeIaJaki4cMT7jyzks5s3XMggaJpZM4QgjMf .

enkiv2 avatar Nov 17 '17 11:11 enkiv2

Finished novel is here.

cpressey avatar Nov 20 '17 11:11 cpressey

Couldn't you find exactly 316 words of public domain material about vanishment or disappearance?

Not for this work, thanks for understanding.

cpressey avatar Nov 20 '17 11:11 cpressey