plotmachines Multiple issues to get running:

function run_batch in train.py


def run_batch(model, args, device, compute_loss_fct):
    for arg in args:
        if arg is not None:
            arg = arg.to(device)

    output = model(*args, device=device)
    allloss = compute_loss_fct(output, args[0], args[1])
    return allloss.mean()

args never get converted to cuda devices, and subsequent functions fail almost immediately, could never have run successfully on a cuda device.

def run_batch(model, args, device, compute_loss_fct):
    i=0
    for arg in args:
        if arg is not None:
            args[i] = arg.to(device)
            i += 1
    output = model(*args, device=device)
    allloss = compute_loss_fct(output, args[0], args[1])
    return allloss.mean()

seems to work fine.

in preprocessing/README.MD:

2. Steps for extracting outlines

Run the extract_outlines.py to extract the outline-labeled documents that can be used as input to the train Plotmachines fine-tuning models.

The output will provide you with a csv of the outlines and stories where each row is a paragraph from a story. The columns are:

story id: our format is "storyid_{int}" with the {int} after the underscore being this paragraph's index in the story (starting a t 0)

key/abstract: this is a binary signifier for us to know where the data came from, but it's just in "K" for every row, in wikiplot s

outline: the outline with points delimited by [SEP]

discourse tag: I/B/C for intro, body, conclusion paragraphs respectively

num_paragraphs: total number of paragraphs in this story

paragraph: the paragraph text

previous paragraph: text from the previous paragraph in the story

the referenced plots and titles files need to be 'pre-processed' themselves to remove any newlines and replace with a '

' to avoid being garbled by extract_outlines.py - could not have worked as supplied.

3. Steps for splitting into train/dev/test splits

Please use the splits from wikiplots_splits.txt to construct the train, validation and text datasets that were used in the paper. Note that some stories may be need to be removed (marked "flagged") due to potentially offensive and/or harmful content.

#!/bin/bash
rm -rf train_encoded.csv val_encoeded.csv test_encoded.csv
echo in script $1
while read -r line
do
  inp=(${line})
  plot=${inp[0]}
  outfile=${inp[-1]}
  grep "${plot}" *RAKE*.csv >> ${outfile}_encoded.csv
done < $1

will do it from the created wikiplot.kwRAKE.csv

references to 'dev_encoded.csv' throughout should probably be val_encoded.csv, somewhat inconsistent

Aug 29 '20 14:08 jfrancis99

function run_batch in train.py
def run_batch(model, args, device, compute_loss_fct):
    for arg in args:
        if arg is not None:
            arg = arg.to(device)

    output = model(*args, device=device)
    allloss = compute_loss_fct(output, args[0], args[1])
    return allloss.mean()
args never get converted to cuda devices, and subsequent functions fail almost immediately, could never have run successfully on a cuda device.
def run_batch(model, args, device, compute_loss_fct):
    i=0
    for arg in args:
        if arg is not None:
            args[i] = arg.to(device)
            i += 1
    output = model(*args, device=device)
    allloss = compute_loss_fct(output, args[0], args[1])
    return allloss.mean()
seems to work fine.

in preprocessing/README.MD:

2. Steps for extracting outlines

Run the extract_outlines.py to extract the outline-labeled documents that can be used as input to the train Plotmachines fine-tuning models. The output will provide you with a csv of the outlines and stories where each row is a paragraph from a story. The columns are:

story id: our format is "storyid_{int}" with the {int} after the underscore being this paragraph's index in the story (starting a t 0)

key/abstract: this is a binary signifier for us to know where the data came from, but it's just in "K" for every row, in wikiplot s

outline: the outline with points delimited by [SEP]

discourse tag: I/B/C for intro, body, conclusion paragraphs respectively

num_paragraphs: total number of paragraphs in this story

paragraph: the paragraph text

previous paragraph: text from the previous paragraph in the story

the referenced plots and titles files need to be 'pre-processed' themselves to remove any newlines and replace with a '

' to avoid being garbled by extract_outlines.py - could not have worked as supplied.

3. Steps for splitting into train/dev/test splits

Please use the splits from wikiplots_splits.txt to construct the train, validation and text datasets that were used in the paper. Note that some stories may be need to be removed (marked "flagged") due to potentially offensive and/or harmful content.
#!/bin/bash
rm -rf train_encoded.csv val_encoeded.csv test_encoded.csv
echo in script $1
while read -r line
do
  inp=(${line})
  plot=${inp[0]}
  outfile=${inp[-1]}
  grep "${plot}" *RAKE*.csv >> ${outfile}_encoded.csv
done < $1
will do it from the created wikiplot.kwRAKE.csv

references to 'dev_encoded.csv' throughout should probably be val_encoded.csv, somewhat inconsistent

Hi, I run into the same preprocessing problem in 2. Steps for extracting outlines, could you share the processed file? Thanks!

Dec 29 '20 18:12 wentinghome

function run_batch in train.py
def run_batch(model, args, device, compute_loss_fct):
    for arg in args:
        if arg is not None:
            arg = arg.to(device)

    output = model(*args, device=device)
    allloss = compute_loss_fct(output, args[0], args[1])
    return allloss.mean()
args never get converted to cuda devices, and subsequent functions fail almost immediately, could never have run successfully on a cuda device.
def run_batch(model, args, device, compute_loss_fct):
    i=0
    for arg in args:
        if arg is not None:
            args[i] = arg.to(device)
            i += 1
    output = model(*args, device=device)
    allloss = compute_loss_fct(output, args[0], args[1])
    return allloss.mean()
seems to work fine.

in preprocessing/README.MD:

2. Steps for extracting outlines

Run the extract_outlines.py to extract the outline-labeled documents that can be used as input to the train Plotmachines fine-tuning models. The output will provide you with a csv of the outlines and stories where each row is a paragraph from a story. The columns are:

story id: our format is "storyid_{int}" with the {int} after the underscore being this paragraph's index in the story (starting a t 0)

key/abstract: this is a binary signifier for us to know where the data came from, but it's just in "K" for every row, in wikiplot s

outline: the outline with points delimited by [SEP]

discourse tag: I/B/C for intro, body, conclusion paragraphs respectively

num_paragraphs: total number of paragraphs in this story

paragraph: the paragraph text

previous paragraph: text from the previous paragraph in the story

the referenced plots and titles files need to be 'pre-processed' themselves to remove any newlines and replace with a ' ' to avoid being garbled by extract_outlines.py - could not have worked as supplied.

3. Steps for splitting into train/dev/test splits

Please use the splits from wikiplots_splits.txt to construct the train, validation and text datasets that were used in the paper. Note that some stories may be need to be removed (marked "flagged") due to potentially offensive and/or harmful content.
#!/bin/bash
rm -rf train_encoded.csv val_encoeded.csv test_encoded.csv
echo in script $1
while read -r line
do
  inp=(${line})
  plot=${inp[0]}
  outfile=${inp[-1]}
  grep "${plot}" *RAKE*.csv >> ${outfile}_encoded.csv
done < $1
will do it from the created wikiplot.kwRAKE.csv references to 'dev_encoded.csv' throughout should probably be val_encoded.csv, somewhat inconsistent
Hi, I run into the same preprocessing problem in 2. Steps for extracting outlines, could you share the processed file? Thanks!

Hi, have you solved your problem? I run into the same problem.

Jul 21 '21 01:07 AIRicky

function run_batch in train.py
def run_batch(model, args, device, compute_loss_fct):
    for arg in args:
        if arg is not None:
            arg = arg.to(device)

    output = model(*args, device=device)
    allloss = compute_loss_fct(output, args[0], args[1])
    return allloss.mean()
args never get converted to cuda devices, and subsequent functions fail almost immediately, could never have run successfully on a cuda device.
def run_batch(model, args, device, compute_loss_fct):
    i=0
    for arg in args:
        if arg is not None:
            args[i] = arg.to(device)
            i += 1
    output = model(*args, device=device)
    allloss = compute_loss_fct(output, args[0], args[1])
    return allloss.mean()
seems to work fine.

in preprocessing/README.MD:

2. Steps for extracting outlines

Run the extract_outlines.py to extract the outline-labeled documents that can be used as input to the train Plotmachines fine-tuning models. The output will provide you with a csv of the outlines and stories where each row is a paragraph from a story. The columns are:

story id: our format is "storyid_{int}" with the {int} after the underscore being this paragraph's index in the story (starting a t 0)

key/abstract: this is a binary signifier for us to know where the data came from, but it's just in "K" for every row, in wikiplot s

outline: the outline with points delimited by [SEP]

discourse tag: I/B/C for intro, body, conclusion paragraphs respectively

num_paragraphs: total number of paragraphs in this story

paragraph: the paragraph text

previous paragraph: text from the previous paragraph in the story

the referenced plots and titles files need to be 'pre-processed' themselves to remove any newlines and replace with a '

' to avoid being garbled by extract_outlines.py - could not have worked as supplied.

3. Steps for splitting into train/dev/test splits

Please use the splits from wikiplots_splits.txt to construct the train, validation and text datasets that were used in the paper. Note that some stories may be need to be removed (marked "flagged") due to potentially offensive and/or harmful content.
#!/bin/bash
rm -rf train_encoded.csv val_encoeded.csv test_encoded.csv
echo in script $1
while read -r line
do
  inp=(${line})
  plot=${inp[0]}
  outfile=${inp[-1]}
  grep "${plot}" *RAKE*.csv >> ${outfile}_encoded.csv
done < $1
will do it from the created wikiplot.kwRAKE.csv

references to 'dev_encoded.csv' throughout should probably be val_encoded.csv, somewhat inconsistent

grep will work better here with _ after ${plot} to avoid matching plot-10, plot-100, plot-101, etc... grep's -m 1 flag might accomplish the same thing, too

#!/bin/bash
rm -rf train_encoded.csv val_encoeded.csv test_encoded.csv
echo in script $1
while read -r line
do
  inp=(${line})
  plot=${inp[0]}
  outfile=${inp[-1]}
  grep "${plot}_" *RAKE*.csv >> ${outfile}_encoded.csv
done < $1

Nov 17 '21 01:11 bryanjohns

plotmachines plotmachines copied to clipboard

Multiple issues to get running:

2. Steps for extracting outlines

3. Steps for splitting into train/dev/test splits

2. Steps for extracting outlines

3. Steps for splitting into train/dev/test splits

2. Steps for extracting outlines

3. Steps for splitting into train/dev/test splits

2. Steps for extracting outlines

3. Steps for splitting into train/dev/test splits

plotmachines
plotmachines copied to clipboard