aitextgen icon indicating copy to clipboard operation
aitextgen copied to clipboard

Issue with to_fp16()

Open Manas-Embold opened this issue 4 years ago • 7 comments
trafficstars

Hi Max,

I trained 344M model using gpt2 simple (dataset was java code for auto code completion) and saved the checkpoint. Converted the model to pytorch using:

! cd '/content/checkpoint' && transformers-cli convert --model_type gpt2 --tf_checkpoint '/content/checkpoint/run1/' --pytorch_dump_output '/content/checkpoint/run1/pytorch' --config '/content/checkpoint/run1/hparams.json'

When i load the model normally

from aitextgen import aitextgen config = '/content/checkpoint/run1/pytorch/config.json' ai = aitextgen(model="/content/checkpoint/run1/pytorch/pytorch_model.bin", config=config)

No issues and i can generate easily:

ai.generate(n=1, prompt="system.out.", max_length=100)

OUTPUT: system.out.println( + id);

However since, I want to convert this to fp16 for fast inferencing I converted model to fp 16 as follows

from aitextgen import aitextgen config = '/content/checkpoint/run1/pytorch/config.json' ai = aitextgen(model="/content/checkpoint/run1/pytorch/pytorch_model.bin", config=config, to_gpu=True, to_fp16=True)

When i call generate now, it outputs english instead of java

ai.generate(n=1, prompt="system.out.", max_length=100)

OUTPUT: system.out. loc character decidedally healthy ultimately points belie mass nearly regidedot price clicklike make TodayocaInd unlike journal Norretene links Good void et attackalsAnSD 54giving sing high Assassatelyhus Y humansware concerned connectionsSt� was believesligmartacing Geteworkamedann·aultrict dep2013� daughtermentructure couldentiallyrolloth confrontted Archbi suitiffge beaut Ed industward Sony* thereileOMrugateg rented Birminghamvironment underinceeg Windows intense static

Manas-Embold avatar Nov 25 '20 11:11 Manas-Embold

Any thoughts, where am i going wrong in conversion ? I think after conversion its kind of loading default gpt2 english language model instead of mine gpt-2 model trained on java code.

Manas-Embold avatar Nov 25 '20 11:11 Manas-Embold

When i use to_gpu=True and to_fp16=True for loading, i get english as output When i just use to_fp16=True and skip to_gpu=True, i get proper java output

This looks strange.

Manas-Embold avatar Nov 26 '20 05:11 Manas-Embold

to_fp16() is sorta beta and not fully tested. Ideally the ONNX support which I intend to add will handle this better.

However, that output is just weird in that it's pseudorandom as opposed to fully random, which may imply a different issue in the pipeline.

minimaxir avatar Nov 30 '20 18:11 minimaxir

Alright, thanks for reviewing !

junkgear avatar Nov 30 '20 18:11 junkgear

Tested: yes it's random output. I assume something changed in Transformers upstream, so I might have to remove it (also there doesn't seem to be a speed increase anymore). Will add a warning for now.

minimaxir avatar Dec 01 '20 00:12 minimaxir

I'm able to use fp16 with sensible outputs if I use:

with torch.cuda.amp.autocast():
    ai.generate(...)

Interestingly, I seem to be getting slower generation using fp16 on an RTX2060. Though, half memory usage is a plus.

briansemrau avatar Jan 27 '21 23:01 briansemrau

I was really puzzled by this: I found to_fp16 was generating sensible, normal content on Google Colab despite the warning messages, but was totally bizarre in production. It turned out the pyTorch versions were different - Google was on torch 1.8.1 and Cuda 11.1, while my server was torch 1.7 and Cuda 11.0 . Once I upgraded the libraries on my server I found FP16 generation was working correctly again, so it may be worth updating the warning where people are on older pyTorch versions?

jonnyplatt avatar Apr 28 '21 08:04 jonnyplatt