plotmachines icon indicating copy to clipboard operation
plotmachines copied to clipboard

Multiple issues to get running:

Open jfrancis99 opened this issue 4 years ago • 3 comments

  1. function run_batch in train.py

def run_batch(model, args, device, compute_loss_fct):
    for arg in args:
        if arg is not None:
            arg = arg.to(device)

    output = model(*args, device=device)
    allloss = compute_loss_fct(output, args[0], args[1])
    return allloss.mean()

args never get converted to cuda devices, and subsequent functions fail almost immediately, could never have run successfully on a cuda device.

def run_batch(model, args, device, compute_loss_fct):
    i=0
    for arg in args:
        if arg is not None:
            args[i] = arg.to(device)
            i += 1
    output = model(*args, device=device)
    allloss = compute_loss_fct(output, args[0], args[1])
    return allloss.mean()

seems to work fine.

  1. in preprocessing/README.MD:

2. Steps for extracting outlines

Run the extract_outlines.py to extract the outline-labeled documents that can be used as input to the train Plotmachines fine-tuning models.

The output will provide you with a csv of the outlines and stories where each row is a paragraph from a story. The columns are:

  • story id: our format is "storyid_{int}" with the {int} after the underscore being this paragraph's index in the story (starting a t 0)
  • key/abstract: this is a binary signifier for us to know where the data came from, but it's just in "K" for every row, in wikiplot s
  • outline: the outline with points delimited by [SEP]
  • discourse tag: I/B/C for intro, body, conclusion paragraphs respectively
  • num_paragraphs: total number of paragraphs in this story
  • paragraph: the paragraph text
  • previous paragraph: text from the previous paragraph in the story

the referenced plots and titles files need to be 'pre-processed' themselves to remove any newlines and replace with a '

' to avoid being garbled by extract_outlines.py - could not have worked as supplied.

3. Steps for splitting into train/dev/test splits

Please use the splits from wikiplots_splits.txt to construct the train, validation and text datasets that were used in the paper. Note that some stories may be need to be removed (marked "flagged") due to potentially offensive and/or harmful content.

#!/bin/bash
rm -rf train_encoded.csv val_encoeded.csv test_encoded.csv
echo in script $1
while read -r line
do
  inp=(${line})
  plot=${inp[0]}
  outfile=${inp[-1]}
  grep "${plot}" *RAKE*.csv >> ${outfile}_encoded.csv
done < $1

will do it from the created wikiplot.kwRAKE.csv

references to 'dev_encoded.csv' throughout should probably be val_encoded.csv, somewhat inconsistent

jfrancis99 avatar Aug 29 '20 14:08 jfrancis99

  1. function run_batch in train.py

def run_batch(model, args, device, compute_loss_fct):
    for arg in args:
        if arg is not None:
            arg = arg.to(device)

    output = model(*args, device=device)
    allloss = compute_loss_fct(output, args[0], args[1])
    return allloss.mean()

args never get converted to cuda devices, and subsequent functions fail almost immediately, could never have run successfully on a cuda device.

def run_batch(model, args, device, compute_loss_fct):
    i=0
    for arg in args:
        if arg is not None:
            args[i] = arg.to(device)
            i += 1
    output = model(*args, device=device)
    allloss = compute_loss_fct(output, args[0], args[1])
    return allloss.mean()

seems to work fine.

  1. in preprocessing/README.MD:

2. Steps for extracting outlines

Run the extract_outlines.py to extract the outline-labeled documents that can be used as input to the train Plotmachines fine-tuning models. The output will provide you with a csv of the outlines and stories where each row is a paragraph from a story. The columns are:

  • story id: our format is "storyid_{int}" with the {int} after the underscore being this paragraph's index in the story (starting a t 0)
  • key/abstract: this is a binary signifier for us to know where the data came from, but it's just in "K" for every row, in wikiplot s
  • outline: the outline with points delimited by [SEP]
  • discourse tag: I/B/C for intro, body, conclusion paragraphs respectively
  • num_paragraphs: total number of paragraphs in this story
  • paragraph: the paragraph text
  • previous paragraph: text from the previous paragraph in the story

the referenced plots and titles files need to be 'pre-processed' themselves to remove any newlines and replace with a '

' to avoid being garbled by extract_outlines.py - could not have worked as supplied.

3. Steps for splitting into train/dev/test splits

Please use the splits from wikiplots_splits.txt to construct the train, validation and text datasets that were used in the paper. Note that some stories may be need to be removed (marked "flagged") due to potentially offensive and/or harmful content.

#!/bin/bash
rm -rf train_encoded.csv val_encoeded.csv test_encoded.csv
echo in script $1
while read -r line
do
  inp=(${line})
  plot=${inp[0]}
  outfile=${inp[-1]}
  grep "${plot}" *RAKE*.csv >> ${outfile}_encoded.csv
done < $1

will do it from the created wikiplot.kwRAKE.csv

references to 'dev_encoded.csv' throughout should probably be val_encoded.csv, somewhat inconsistent

Hi, I run into the same preprocessing problem in 2. Steps for extracting outlines, could you share the processed file? Thanks!

wentinghome avatar Dec 29 '20 18:12 wentinghome

  1. function run_batch in train.py

def run_batch(model, args, device, compute_loss_fct):
    for arg in args:
        if arg is not None:
            arg = arg.to(device)

    output = model(*args, device=device)
    allloss = compute_loss_fct(output, args[0], args[1])
    return allloss.mean()

args never get converted to cuda devices, and subsequent functions fail almost immediately, could never have run successfully on a cuda device.

def run_batch(model, args, device, compute_loss_fct):
    i=0
    for arg in args:
        if arg is not None:
            args[i] = arg.to(device)
            i += 1
    output = model(*args, device=device)
    allloss = compute_loss_fct(output, args[0], args[1])
    return allloss.mean()

seems to work fine.

  1. in preprocessing/README.MD:

2. Steps for extracting outlines

Run the extract_outlines.py to extract the outline-labeled documents that can be used as input to the train Plotmachines fine-tuning models. The output will provide you with a csv of the outlines and stories where each row is a paragraph from a story. The columns are:

  • story id: our format is "storyid_{int}" with the {int} after the underscore being this paragraph's index in the story (starting a t 0)
  • key/abstract: this is a binary signifier for us to know where the data came from, but it's just in "K" for every row, in wikiplot s
  • outline: the outline with points delimited by [SEP]
  • discourse tag: I/B/C for intro, body, conclusion paragraphs respectively
  • num_paragraphs: total number of paragraphs in this story
  • paragraph: the paragraph text
  • previous paragraph: text from the previous paragraph in the story

the referenced plots and titles files need to be 'pre-processed' themselves to remove any newlines and replace with a ' ' to avoid being garbled by extract_outlines.py - could not have worked as supplied.

3. Steps for splitting into train/dev/test splits

Please use the splits from wikiplots_splits.txt to construct the train, validation and text datasets that were used in the paper. Note that some stories may be need to be removed (marked "flagged") due to potentially offensive and/or harmful content.

#!/bin/bash
rm -rf train_encoded.csv val_encoeded.csv test_encoded.csv
echo in script $1
while read -r line
do
  inp=(${line})
  plot=${inp[0]}
  outfile=${inp[-1]}
  grep "${plot}" *RAKE*.csv >> ${outfile}_encoded.csv
done < $1

will do it from the created wikiplot.kwRAKE.csv references to 'dev_encoded.csv' throughout should probably be val_encoded.csv, somewhat inconsistent

Hi, I run into the same preprocessing problem in 2. Steps for extracting outlines, could you share the processed file? Thanks!

Hi, have you solved your problem? I run into the same problem.

AIRicky avatar Jul 21 '21 01:07 AIRicky

  1. function run_batch in train.py

def run_batch(model, args, device, compute_loss_fct):
    for arg in args:
        if arg is not None:
            arg = arg.to(device)

    output = model(*args, device=device)
    allloss = compute_loss_fct(output, args[0], args[1])
    return allloss.mean()

args never get converted to cuda devices, and subsequent functions fail almost immediately, could never have run successfully on a cuda device.

def run_batch(model, args, device, compute_loss_fct):
    i=0
    for arg in args:
        if arg is not None:
            args[i] = arg.to(device)
            i += 1
    output = model(*args, device=device)
    allloss = compute_loss_fct(output, args[0], args[1])
    return allloss.mean()

seems to work fine.

  1. in preprocessing/README.MD:

2. Steps for extracting outlines

Run the extract_outlines.py to extract the outline-labeled documents that can be used as input to the train Plotmachines fine-tuning models. The output will provide you with a csv of the outlines and stories where each row is a paragraph from a story. The columns are:

  • story id: our format is "storyid_{int}" with the {int} after the underscore being this paragraph's index in the story (starting a t 0)
  • key/abstract: this is a binary signifier for us to know where the data came from, but it's just in "K" for every row, in wikiplot s
  • outline: the outline with points delimited by [SEP]
  • discourse tag: I/B/C for intro, body, conclusion paragraphs respectively
  • num_paragraphs: total number of paragraphs in this story
  • paragraph: the paragraph text
  • previous paragraph: text from the previous paragraph in the story

the referenced plots and titles files need to be 'pre-processed' themselves to remove any newlines and replace with a '

' to avoid being garbled by extract_outlines.py - could not have worked as supplied.

3. Steps for splitting into train/dev/test splits

Please use the splits from wikiplots_splits.txt to construct the train, validation and text datasets that were used in the paper. Note that some stories may be need to be removed (marked "flagged") due to potentially offensive and/or harmful content.

#!/bin/bash
rm -rf train_encoded.csv val_encoeded.csv test_encoded.csv
echo in script $1
while read -r line
do
  inp=(${line})
  plot=${inp[0]}
  outfile=${inp[-1]}
  grep "${plot}" *RAKE*.csv >> ${outfile}_encoded.csv
done < $1

will do it from the created wikiplot.kwRAKE.csv

references to 'dev_encoded.csv' throughout should probably be val_encoded.csv, somewhat inconsistent

grep will work better here with _ after ${plot} to avoid matching plot-10, plot-100, plot-101, etc... grep's -m 1 flag might accomplish the same thing, too

#!/bin/bash
rm -rf train_encoded.csv val_encoeded.csv test_encoded.csv
echo in script $1
while read -r line
do
  inp=(${line})
  plot=${inp[0]}
  outfile=${inp[-1]}
  grep "${plot}_" *RAKE*.csv >> ${outfile}_encoded.csv
done < $1

bryanjohns avatar Nov 17 '21 01:11 bryanjohns