tutorials icon indicating copy to clipboard operation
tutorials copied to clipboard

Updating "Classifying Names with a Character-Level RNN"

Open mgs28 opened this issue 1 year ago • 12 comments

Fixes #1166

Description

Updating Sean's excellent RNN classification tutorial that is now 8 years old and missing some newer pytorch functionality.

  • Cannot use default Dataloader to select batch sizes because "stack expects each tensor to be of equal size" but each of the names are of different length. However, updated the code to use mini batches without Dataloader functionality.
  • Introducing pytorch's Datasets class, we show how to split the data into train and test datasets which changes the training explanation.
  • Rewrote pieces of the tutorial to use three classes to improve re-use (Data, DataSet and RNN).
  • Added a little more explanation to how RNNs score multi-character strings and their 2D matrix of tensors.
  • Changed evaluation from random training examples to an entire the test set.
  • removed some of the command line explanations since notebooks are used more often.
  • tried to preserve as much of the original text, functions and style as possible.

Checklist

  • [x] The issue that is being fixed is referred in the description (see above "Fixes #ISSUE_NUMBER")
  • [x] Only one issue is addressed in this pull request
  • [x] Labels from the issue that this PR is fixing are added to this pull request
  • [x] No unnecessary issues are included into this pull request.

cc @albanD

mgs28 avatar Jun 24 '24 16:06 mgs28

:link: Helpful Links

:test_tube: See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/tutorials/2954

Note: Links to docs will display an error until the docs builds have been completed.

:white_check_mark: No Failures

As of commit e31a1411077de66e678cbceb9ca6f0a67ba77b55 with merge base a91f6310bee185be97795e453670dca973b9bd26 (image): :green_heart: Looks good so far! There are no failures yet. :green_heart:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

pytorch-bot[bot] avatar Jun 24 '24 16:06 pytorch-bot[bot]

Hi @mgs28!

Thank you for your pull request and welcome to our community.

Action Required

In order to merge any pull request (code, docs, etc.), we require contributors to sign our Contributor License Agreement, and we don't seem to have one on file for you.

Process

In order for us to review and merge your suggested changes, please sign at https://code.facebook.com/cla. If you are contributing on behalf of someone else (eg your employer), the individual CLA may not be sufficient and your employer may need to sign the corporate CLA.

Once the CLA is signed, our tooling will perform checks and validations. Afterwards, the pull request will be tagged with CLA signed. The tagging process may take up to 1 hour after signing. Please give it that time before contacting us about it.

If you have received this in error or have any questions, please contact us at [email protected]. Thanks!

facebook-github-bot avatar Jun 24 '24 16:06 facebook-github-bot

@svekars - it looks like you are active a lot in this repo, any chance you could help me with this? Thanks!

mgs28 avatar Jun 26 '24 01:06 mgs28

Added functionality to process training data in mini batches to satisfy original story. However, had to use numpy + random to create batch indices from a given dataset.

Also, simplified training so it was a closer match to https://pytorch.org/tutorials/beginner/basics/optimization_tutorial.html

mgs28 avatar Jun 28 '24 01:06 mgs28

@svekars - would you please help me with this tutorial update pull request or point me to someone who could?

mgs28 avatar Jul 08 '24 19:07 mgs28

cc: @spro

svekars avatar Jul 08 '24 20:07 svekars

Sorry about the spelling errors! I ran pyspelling and re-ran make html for the tutorial. This should pass those CI steps now.

I also added a story for me to come back and update the CONTRIBUTING.md to include some of these checks. (https://github.com/pytorch/tutorials/issues/2969)

Thanks @spro @svekars !

mgs28 avatar Jul 10 '24 16:07 mgs28

@spro and @svekars - I significantly cut the training time although it is faster on my CPU than GPU. It runs in 72 seconds on my local CPU. I also added some default device so it looks for your CUDA build machines to hopefully make it faster.

Thanks!

mgs28 avatar Jul 11 '24 22:07 mgs28

@jbschlosser - thank you for the lovely suggestions to improve. If possible, I'd like to split into two things:

First, the edits to my existing content.

  1. Excellent point on NameData. I removed it and used helper functions.
  2. I'm glad you like the Dataset addition - that was my prompt for doing this.
  3. Thanks for letting me know the multiple definitions were confusing. I focused it on one simple one, particularly since training was split from the object.
  4. I added training as a separate function. I will go learn more about those third party trainers since I haven't used them.

Secondly, I would really like to use the nn.RNN if possible. There are very few tutorials that mention them and everyone seems to drive their RNN builds off this tutorial. However to solve this task, I think I need a network with layers like [57, 128, 18] and it looks like the default Elman networks are stuck at [57, 18] with layers.

Is best practice to inherit from nn.RNN and add my own fully connected output layer or am I misunderstanding something?

Thanks!

mgs28 avatar Sep 13 '24 19:09 mgs28

To make it simpler, I assume extending the nn.RNN class might look like (which runs about 40% faster)

class MyRNN(nn.RNN): def init(self,input_size, hidden_size, output_size): super(MyRNN, self).init(input_size, hidden_size)

    self.h2o = nn.Linear(hidden_size, output_size)
    self.softmax = nn.LogSoftmax(dim=1)

def forward(self, line_tensor):
    # Pass the input through the RNN layers
    rnn_out, hidden = super(MyRNN, self).forward(line_tensor)
    output = self.h2o(hidden[0])
    output = self.softmax(output)
    
    return output

mgs28 avatar Sep 13 '24 20:09 mgs28

Is best practice to inherit from nn.RNN and add my own fully connected output layer or am I misunderstanding something?

Rather than inherit, we generally encourage composition. In this case, something like:

# better name needed :)
class MyRNN(nn.Module):
    def __init__(self, ...):
        super().__init__()
        self.rnn = nn.RNN(...)
        self.h2o = nn.Linear(...)

    ...
    def forward(self, x):
        _, hidden = self.rnn(x)
        output = self.h2o(hidden[0])
        return F.log_softmax(output, dim=1)

jbschlosser avatar Sep 16 '24 16:09 jbschlosser

Thanks @jbschlosser ! I used nn.rnn in composition and changed some of the surrounding text. That forced the addition of a few terms to the repo dictionary. I appreciate you teaching me something again and hopefully the tutorial is better for it.

mgs28 avatar Sep 17 '24 01:09 mgs28

@jbschlosser - thanks!

  1. The training converges around 35-40 iterations. I am happy to train until convergence or leave some room for the reader. I also left it lighter so that build time is faster for the CI bot. What would you prefer?

  2. Comparisons with the original confusion matrix: It's train/test split + training time. If I train with

all_losses = train(rnn, alldata, n_epoch=100, learning_rate=0.1, report_every=5)

and evaluate on alldata then I get a bright diagonal line that looks pretty similar to original. I imagine with some parameter tuning then I could get closer.

mgs28 avatar Sep 24 '24 01:09 mgs28

I also left it lighter so that build time is faster for the CI bot. What would you prefer?

It's a good point that we want to balance this some. That said, I think it'd be nice to be a little bit closer to the original (at least somewhat of a diagonal line confusion matrix). Hopefully we can strike a reasonable balance where we're beginning to see a diagonal line trend. No need to spend a ton of time parameter tuning though :) that's okay left as an exercise to the reader.

jbschlosser avatar Sep 24 '24 20:09 jbschlosser

No problem @jbschlosser - I tuned some of the parameters to get pretty close to a diagonal confusion matrix. It still gets a little confused between English and Scottish as well as Korean and Chinese. However, there's a strong diagonal in the confusion matrix. Thanks!

mgs28 avatar Sep 25 '24 01:09 mgs28

Thanks for the updates! I'm good with it; will let @svekars make the final approval :)

jbschlosser avatar Sep 26 '24 16:09 jbschlosser

@svekars - do you have any further comments on this update? I'm happy to address. Thanks!

mgs28 avatar Oct 17 '24 14:10 mgs28

@svekars - Sorry on two accounts:

  1. Sorry for the delay... Some personal stuff come up and tonight was the earliest I could address.
  2. I made review of your editing changes harder. I manually made some of your changes via a commit and accepted some here in the UI. I apologize for that. I was not thinking tonight.

I made one departure from your edits. You suggested removing a comma in "train text and validate our models" but when I re-read it, I actually meant "train, test and validate our models". I updated the text to read as such.

I apologize for my delay and muddying up this review but greatly appreciate your feedback on this. All your points were great. Thanks!

mgs28 avatar Oct 31 '24 01:10 mgs28

@svekars - Happy Tuesday. Is there anything else I need to do to help merge this tutorial in?

Thanks!

mgs28 avatar Dec 03 '24 14:12 mgs28

Looks good - would be great to also fix these user warnings:

Screenshot 2024-12-03 at 12 13 09 PM

svekars avatar Dec 03 '24 20:12 svekars

@svekars - Fixed it. Sorry I missed that warning! Thanks

mgs28 avatar Dec 03 '24 22:12 mgs28

@mgs28 I really want to merge this but I wonder what happened in the update that increased the time of the build for this tutorial from ~6 minutes to 22 minutes

svekars avatar Dec 06 '24 21:12 svekars

@svekars - I also want you to merge it in. :)

History: The training is different (batches instead of random samples) and if you look at the comments around July 11th, we tinkered with the parameters to make the confusion matrix better.

My best suggestion is to set device = cpu at the beginning of the script. With 10% of my laptop CPU, it takes about 4 minutes. Can you share some of the specs of the deployment machine? That might give me a bit more to dig into. Thanks!

mgs28 avatar Dec 06 '24 22:12 mgs28

@svekars - I also want you to merge it in. :)

History: The training is different (batches instead of random samples) and if you look at the comments around July 11th, we tinkered with the parameters to make the confusion matrix better.

My best suggestion is to set device = cpu at the beginning of the script. With 10% of my laptop CPU, it takes about 4 minutes. Can you share some of the specs of the deployment machine? That might give me a bit more to dig into. Thanks!

It's a 1 GPU machine. Can you try with device = cpu?

svekars avatar Dec 07 '24 02:12 svekars

@svekars - good idea. I removed the device options at the top of the script. pyspelling passed and the build html looks good. Please let me know if I missed something. Thanks!

mgs28 avatar Dec 09 '24 16:12 mgs28

well, now its43 minutes...

svekars avatar Dec 09 '24 19:12 svekars

@svekars - well... clearly that's the wrong direction... I submitted a change to speed it back up

What's your goal?

In July and more recently, I compared the old code to this new addition.

  • I added an optimizer (SGD) instead of manually updating weights in the prior tutorial
  • I am using datasets (original reason for the update) instead of randomly selecting 100,000 examples
  • Batches require training over 9.8x data (55 iterations through 17,063 points = 938,465 examples). We're about 3x faster per example but training more examples.
  • The original article tests on the training data and we changed to withhold a separate dataset. This makes the performance look worse so we increased training time.

All of these changes make this tutorial more in line with best practices. To make the time goal, I can reduce training by a percentage.

  • Original training (estimate 22 minutes) accuracy = 0.796745240688324 confusion matrix has a strong diagonal

  • Half training (estimate 11 minutes) accuracy = 0.7974094748497009 confusion matrix has a less strong diagonal

  • Quarter training (estimate 5 minutes) accuracy = 0.8003985285758972 confusion matrix has no diagonal

I checked in the 11 minute version for you which I think is a good compromise. Do you agree?

mgs28 avatar Dec 09 '24 23:12 mgs28

Agree on the 11 minutes compromise - thank you for adjusting! Please add a note that for the purpose of example the number of epochs is reduced and we recommend to set it to the a larger number for higher level of accuracy.

svekars avatar Dec 10 '24 19:12 svekars

@svekars - good idea and done. I added a comment where training is specified. The first item in the end exercises also mentions hyperparameter tuning as a good place to start. Does that work? Thanks!

mgs28 avatar Dec 10 '24 21:12 mgs28