blog icon indicating copy to clipboard operation
blog copied to clipboard

Efficient BLOOM Inference on PyTorch

Open stas00 opened this issue 3 years ago • 9 comments

WIP: Efficient BLOOM Inference on PyTorch

Covers Accelerate, DS-ZeRO, DS-Inference and hopefully a whole new implementation Nicolas created in Rust for servers.

Preview: https://github.com/stas00/blog/blob/bloom-inference-pytorch/bloom-inference-pytorch.md

cc: @sgugger, @Narsil

stas00 avatar Jul 20 '22 19:07 stas00

I think for the big picture we need an extra paragraph explaining ...

I was hoping you'd write a whole section on the server-style inference. I have even added you as an author ahead of time ;)

Perhaps this article could just discuss the server inference, summarizing your discoveries, but not bothering with the code and perhaps another article down the road with details of the code if you'd be inspired to do so?

Because, otherwise this article so far focuses on fast inference and the server solution is a related but a very different problem to solve, but the token generation approach is the same w/ or w/o the server, IMHO.

stas00 avatar Aug 02 '22 15:08 stas00

@stas00 I added a few things, don't hesitate to comment or modify directly.

I felt that the first part and clarification about single forward latency vs throughput (token/ms) was needed otherwise readers would think we were actually getting 1 token in 0.7ms. (which isn't correct afaik).

Latency = total run time / 5 (number of cyles) / 100 (tokens per cycle) should be roughly the time T showed in my diagram. (Past vs initial inference are different but ignoring for the sake of brevity and you already mention it in the blog)

Usually I prefer to do PR over yours to not modify your work directly, but you seemed ok with the idea of pushing on top.

I checked the images on the blog on light/dark theme and everything should be readable. I aslo did a grammarly pass in a separate PR, hopefully I integrated correct fixes/improvements.

Narsil avatar Aug 04 '22 10:08 Narsil

Hmm, editing the PR directly is fine. but you force-pushed which is not the same as now I have no idea what you have changed other than manually doing a diff of the last version before your force-pushes. If you were to edit the files and commit them directly github would have shown the diff.

Usually please avoid force pushing in PRs that are a collaboration work. As besides the diff issue, it also makes it very difficult to push local changes if they were made before the force-push but weren't committed.

But what's done is done, I will just read it as a new text.

stas00 avatar Aug 05 '22 05:08 stas00

I think the images you added weren't linked correctly, you can see in the preview that https://github.com/stas00/blog/blob/bloom-inference-pytorch/bloom-inference-pytorch.md that they are 404.

stas00 avatar Aug 05 '22 05:08 stas00

but you force-pushed which is not the same as now I have no idea what you have changed other than manually doing a diff of the last version before your force-pushes.

That was simply a rebase. But I understand. I never get why github forces us to force-push to rebase but there's no real way around it afaik. Here I shouldn't have, but did it by habit because of the merge conflict.

And the commit list is the same, so my changes are still readable as independant commits. https://github.com/huggingface/blog/pull/432/commits/60d5fb8b08767934b6639b4f548e01ba0306d9ca (Ofc I could have swept the rug under you, but I didn't, at least to the best of my knowledge)

Narsil avatar Aug 05 '22 08:08 Narsil

I think the images you added weren't linked correctly, you can see in the preview that https://github.com/stas00/blog/blob/bloom-inference-pytorch/bloom-inference-pytorch.md that they are 404.

Actually the links I correct. I verified on a local build of the blog. I seems your original PR had the same URL and same failure for you image: https://github.com/stas00/blog/blob/1a62600fa3ce36a9949b00908e4ed3a7681a3b68/bloom-inference-pytorch.md

The thing is that the github markdown is confused by the absolute path of the image (which isn't correct in github viewing but is correct once on the hub).

Narsil avatar Aug 05 '22 13:08 Narsil

You don't need to force push to rebase, here is a tool I wrote that I use everywhere to re-base https://github.com/stas00/git-tools/tree/master/git-rebase

And now anybody with a previous clone has to figure out how to solve the wrong branch issue:

$ git pull
hint: You have divergent branches and need to specify how to reconcile them.
hint: You can do so by running one of the following commands sometime before
hint: your next pull:
hint:
hint:   git config pull.rebase false  # merge (the default strategy)
hint:   git config pull.rebase true   # rebase
hint:   git config pull.ff only       # fast-forward only
hint:
hint: You can replace "git config" with "git config --global" to set a default
hint: preference for all repositories. You can also pass --rebase, --no-rebase,
hint: or --ff-only on the command line to override the configured default per
hint: invocation.
fatal: Need to specify how to reconcile divergent branches.

stas00 avatar Aug 05 '22 15:08 stas00

basically my local clone is completely broken right now after these force pushes. I'm just going to make a new clone.

stas00 avatar Aug 05 '22 15:08 stas00

I'm not sure how to proceed - you wiped out all the editorial work that has been done so far, deleting all the fixes :(

edit: not all, but only my uncommitted changes got lost

I'm going to rewind the text to the last good version before your force pushes and then we will need to carefully re-add your changes.

Fixed the image paths as well, so it all gets rendered correctly here: https://github.com/stas00/blog/blob/bloom-inference-pytorch/bloom-inference-pytorch.md

Overall a great addition - and thanks for the visuals which help a lot, though I think one needs to put oneself into the shoes of a reader in some areas as I wasn't able to follow everywhere.

stas00 avatar Aug 05 '22 15:08 stas00