pytorch-lightning icon indicating copy to clipboard operation
pytorch-lightning copied to clipboard

configure_model with deepspeed_stage3 goes wrong

Open cyr0930 opened this issue 1 year ago • 21 comments

Bug description

I tried to train huggingface transformers model with deepspeed_stage3, but when I load model with checkpoint like the code below, error occurs.

I think checkpoint and model template are totally fine, because it worked okay if I configure model in init. (And it might goes wrong only with multi-GPU env?)

What version are you seeing the problem on?

v2.1

How to reproduce the bug

def configure_model(self):
    self.model = SOME_MODEL.from_pretrained(some_path, config=some_config)

Error messages and logs

RuntimeError: Error(s) in loading state_dict for GPTJForCausalLM:
 	size mismatch for transformer.wte.weight: copying a param with shape torch.Size([32768, 3072]) from checkpoint, the shape in current model is torch.Size([0]).
 	size mismatch for transformer.h.0.ln_1.weight: copying a param with shape torch.Size([3072]) from checkpoint, the shape in current model is torch.Size([0]).
...

Environment

Current environment
#- Lightning Component (e.g. Trainer, LightningModule, LightningApp, LightningWork, LightningFlow):
#- PyTorch Lightning Version (e.g., 1.5.0):
#- Lightning App Version (e.g., 0.5.2):
#- PyTorch Version (e.g., 2.0):
#- Python version (e.g., 3.9):
#- OS (e.g., Linux):
#- CUDA/cuDNN version:
#- GPU models and configuration:
#- How you installed Lightning(`conda`, `pip`, source):
#- Running environment of LightningApp (e.g. local, cloud):

More info

No response

cc @borda

cyr0930 avatar Jan 10 '24 10:01 cyr0930

I can actually confirm it doesn't seem to be working on Kitty, ran fc-cache as well as fc-list to double check they were installed.

Works fine on alacritty though, awesome fonts!

PThorpe92 avatar Nov 10 '23 02:11 PThorpe92

Seems like something is off with the spacing. kitty is really picky about all of the characters being the same size. I tried doing the fix described here to force the spacing to be 100 on the Xenon variant, and that got kitty to be able to use it, but it didn't look right. The cursor was positioned low relative to the text, and it didn't seem like all the characters had a consistent baseline.

mjm avatar Nov 10 '23 03:11 mjm

You can use dconf-editor to force GNOME Terminal font setting and it looks great there. Annoying it filters from the list though in the UI.

https://fosstodon.org/@petejohanson/111384211402484279

petejohanson avatar Nov 10 '23 05:11 petejohanson

Maybe Panose which is missing. See also #17

Finii avatar Nov 11 '23 04:11 Finii

GNOME Terminal's Preferences dialog offers to choose from the fonts where pango_font_family_is_monospace() returns true.

I believe this method looks at a certain flag defined within font files, rather than actually measuring its glyphs, but I'm not familiar with the details here.

egmontkob avatar Nov 11 '23 10:11 egmontkob

Could not find out what pango looks for, but usually that is Panose...

def force_panose_monospaced(font):
    """ Forces the Panose flag to monospaced if they are unset or halfway ok already """
    # For some Windows applications (e.g. 'cmd'), they seem to honour the Panose table
    # https://forum.high-logic.com/postedfiles/Panose.pdf
    panose = list(font.os2_panose)
    if panose[0] == 0: # 0 (1st value) = family kind; 0 = any (default)
        panose[0] = 2 # make kind latin text and display
        logger.info("Setting Panose 'Family Kind' to 'Latin Text and Display' (was 'Any')")
        font.os2_panose = tuple(panose)
    if panose[0] == 2 and panose[3] != 9:
        logger.info("Setting Panose 'Proportion' to 'Monospaced' (was '%s')", panose_proportion_to_text(panose[3]))
        panose[3] = 9 # 3 (4th value) = proportion; 9 = monospaced
        font.os2_panose = tuple(panose)

[1] https://forum.high-logic.com/postedfiles/Panose.pdf

Finii avatar Nov 11 '23 13:11 Finii

Kitty, Gnome terminal are running their own checks to see if the font is really a monospaced one or not. It's about this issue on the spacing, running fc-list :spacing=100 does not list any of the Monaspace fonts in the list, but all the ones available for the gnome terminal for example.

marcaurele avatar Nov 12 '23 17:11 marcaurele

This also affects Konsole, although in that case you can override it by ticking show all fonts. (It still doesn't work properly on Konsole, but doing this at least lets you select it.)

ToxicFrog avatar Nov 22 '23 00:11 ToxicFrog

fonts-conf doesn't think that theese fonts are monospace. So one could force it with this little config:

<?xml version="1.0"?>
<!DOCTYPE fontconfig SYSTEM "fonts.dtd">
<fontconfig>
    <match target="scan">
        <test name="family" compare="contains">
            <string>Monaspace</string>
        </test>
        <edit name="spacing">
            <const>dual</const>
        </edit>
    </match>
</fontconfig>

Put it in ~/.config/fontconfig/conf.d/20-monaspace.conf and run fc-cache -f.

alexeyten avatar Nov 22 '23 07:11 alexeyten

Whoops, GH auto closed this when I merged #125

We'll produce a new build soon and check that this (and related issues) are solved.

idan avatar Nov 30 '23 02:11 idan

Any idea when a new build will be produced?

kwmlodozeniec avatar Jan 11 '24 10:01 kwmlodozeniec

Would it be possible to document the built process so it could be done by the community?

kwmlodozeniec avatar Feb 23 '24 09:02 kwmlodozeniec

Whoops, GH auto closed this when I merged #125

We'll produce a new build soon and check that this (and related issues) are solved.

Any idei when a new build wlll be produced? Or did i miss, the build insctructions to build it by myself?

xf0e avatar Mar 25 '24 10:03 xf0e

Or did i miss, the build insctructions to build it by myself?

Well, I guess there are no build instructions needed, just open the .glyphs file(s) and File -> Export...

But there is no commit with anything new here, or did I miss that? In a branch maybe?

Screenshot 2024-03-25 at 13 24 15

Glyphs 3.1.2 shown, sorry that is non-free software

Edit: Probably you can open the .glyphs files with something which is not Glyphs - I have no clue (due to lack of need)

Finii avatar Mar 25 '24 12:03 Finii

@Finii a number of tweaks went in since the release that is currently available.

kwmlodozeniec avatar Mar 25 '24 13:03 kwmlodozeniec

Oh, You are right :-D

image

Edit:

But the sources are untouched:

image

Finii avatar Mar 25 '24 13:03 Finii

the panos plags were set and that solves at least some issues raised all over the place image

kwmlodozeniec avatar Mar 25 '24 13:03 kwmlodozeniec

Oh my... thats why I can not see my own commit :woman_facepalming:

image

Usually my fork is called fork and upstream is origin, maybe I had no time back then ;)

Finii avatar Mar 25 '24 13:03 Finii

I could upload the fixed files in a branch of my fork, if that helps. Well, in fact that would not be allowed due to the RFN, strictly speaking. :grimacing:

Finii avatar Mar 25 '24 13:03 Finii

Btw, I do not know if Panose is sufficient, another far more severe issue is

  • #132

Unfortunately that is far from easily-fixable but a conceptual problem :unamused:

Finii avatar Mar 25 '24 14:03 Finii

indeed, it would be great if the owners could do this or at least hand over to the community but there might be reasons beyond my understanding that could be preventing that.

kwmlodozeniec avatar Mar 25 '24 14:03 kwmlodozeniec

Fixed in version 1.1

heathercran avatar May 21 '24 22:05 heathercran

Oh, You are right :-D

image

What's this a screenshot from? I find it a lot more readable than the default git log --graph.

ToxicFrog avatar Jul 10 '24 21:07 ToxicFrog

What's this a screenshot from? I find it a lot more readable than the default git log --graph.

That's tig (git spelled backwards) https://jonas.github.io/tig/doc/tig.1.html

Tig is an ncurses-based text-mode interface for git. It functions mainly as a Git repository browser, but can also assist in staging changes for commit at chunk level and act as a pager for output from various Git commands.

I usually do not like tools that do git commands for me (like lazygit), but I do use tig as git log / git show substitude and tig blame where you can easily jump the commits up and down.

All Linux distributions as well as Homebrew has packages.

To only other git tool I use is fugitive the vim plugin (in neovim). I rather like to interact with git directly.

Finii avatar Jul 11 '24 05:07 Finii