minGPT issues

Results 79 minGPT issues

Sort by recently updated

Add dtype support

# Motivation People may want to use minGPT in different precisions (fp16, fp32, bf16). This PR integrates this feature to add this possibility to the library. Only floating point precisions...

younesbelkada

Refactor for modern 2022 python style and usage

No actual functionality changes, just a lot of cleanup to make the project look like it belongs in the present instead of something half abandoned from 10 years ago. Some...

mattsta

Is it more reasonable to only use causal attention in the first block of GPT

Dear @karpathy, Thanks for this nice GPT implementation. Really helps a lot! When I comparing this GPT with other Transformers, I found that here all the attention layers was using...

charlesxu90

Meaning of "-1 because very last digit doesn't plug back"

Hi, this is an awesome repository. I was reading on `AdditionDataset` and noticed the block size is calculated as follows: ```python # +1 due to potential carry overflow, but then...

vwxyzjn

Use of amp.autocast does not improve performance

I'm experimenting with `amp.autocast` (automatic mixed precision) with torch version, '1.7.0a0+7036e91' I find the performance has _not improved_ (but slight degraded, 36.2s for AMP vs 30.4s for FP32 with N=1...

aurotripathy

#71 use config n_head instead of hardcoded 4 heads

use config n_head instead of hardcoded 4 heads in model attention block

SpeedCoder5

How to handle unequal sequence length in a batch

I think that you deliberately avoided this problem in the code :) If so, how should it be done?

luxuantao

Added the condition for test_dataset's presence.

`test_loss` is not defined for cases when test_dataset is None. So, I just added the condition to check if test_dataset is present or not.

RohanAwhad

Integration with HuggingFace

I wish minGTP could also work with HuggingFace hub in order to be even more easily experienced by any user. I admit I don't know if this is feasible while...

marxav

Implemented play_word.ipynb example

In the pull request, I implemented a play_word example, which trains a GPT model to translate from modern English to Shakespeare English. It uses a BPE for building vocabulary.

emukans

minGPT
minGPT copied to clipboard

Metadata

Add dtype support

Refactor for modern 2022 python style and usage

Is it more reasonable to only use causal attention in the first block of GPT

Meaning of "-1 because very last digit doesn't plug back"

Use of amp.autocast does not improve performance

#71 use config n_head instead of hardcoded 4 heads

How to handle unequal sequence length in a batch

Added the condition for test_dataset's presence.

Integration with HuggingFace

Implemented play_word.ipynb example

← Metadata

Owner

Metadata

minGPT minGPT copied to clipboard

Metadata

← Metadata

Owner

Metadata

minGPT
minGPT copied to clipboard