BitNet icon indicating copy to clipboard operation
BitNet copied to clipboard

[BUG] Loss drops, model still produces gibberish?

Open MichelNivard opened this issue 1 year ago • 4 comments
trafficstars

Describe the bug

After 5300 iteraitons loss near 2.7, is it still supposed to spit out near giberish?

To Reproduce

Running on CPU, macbookkair M2, omitting the model.cuda() line

Expected behaviour

Some kind of convergence on sentences that are at least english-ish?

Screenshots image

Additional context

Maybe my expectations are just off and I should train way way more?

MichelNivard avatar Feb 29 '24 19:02 MichelNivard

@MichelNivard try training it now and see what happens, I've made many optimizations

kyegomez avatar Mar 03 '24 17:03 kyegomez

Okay digging into it later today, thanks!

MichelNivard avatar Mar 03 '24 21:03 MichelNivard

Hi, I trained model using train.py script to completion, although I used a larger batch size and less epochs due to different GPU usued for training.

training loss: 2.462737798690796
validation loss: 2.5802037715911865

However the model produces gibberish

nlsl,slontpg -ytasetcratiioec m  eenu u- nol b m=&o eliets ao =e raersly rif  rc&ssp eaeteen se llr l vc o&roi eet e-e ialsl dsssenr-cffso&- clafsebnnnu&o&ld&&s l&t;spe &e&n g=cciobod& re broen b o&  geposc efi&lu& lcercudrondllailo&na&dnienhi it en h & f&k& e lo&&p  n t ilng,itptoe& &l &opc-pi   mr&& l-=o&l &eetnsc& rdhe&ctn&e air std lciedeimm=ap&&c&ttoyi&c&a;&  e aa aa&s&oelaabueaconksts&    e&glll r& orrhad    ecn etant&c &   te& nc t& m  ugoleetcic&&eadtryr&hl eelairfd &prnldsiectl&sar fnup c&ie a c&in

The validation line was

'ml]  === The Octave Harmonica ===  Octave harmonicas have two reeds per hole.  The two reeds are tuned to the same note a perfect octave apart.  Many share their basic design with the tremolo harmonica explained above and are built upon this "Weiner system" of construction.  Octave harmonicas also come in what is called the "Knittlinger system".  In this design the top and bottom reed-plates contain all of the blow and draw notes for either to lower or higher pitched set of reeds.  The comb is constructed so that the blow and draw reeds on each reed-plate are paired side-by-side in a single chamber in the same manner as on a standard diatonic but that the top and bottom pairs each have their own chamber.  Thus, in a C harmonica the higher pitched C blow and D draw found in the first "hole" would be placed side-by-side on the upper reed-plate and share a single chamber in the comb and the lower pitched C blow and D draw would be placed side-by-side on the bottom reed-plate and sha'

xwin avatar Mar 07 '24 17:03 xwin

Could we add proper checkpointing to the training loop in train.py?

I've tried torch.save({}), but the model can't be opened with Netron for validation. I'm missing something obviously ..

JohnnyOpcode avatar Apr 13 '24 10:04 JohnnyOpcode

Stale issue message

github-actions[bot] avatar Jun 12 '24 12:06 github-actions[bot]