H3 icon indicating copy to clipboard operation
H3 copied to clipboard

Correct method to load 2.7B?

Open BlinkDL opened this issue 2 years ago • 4 comments
trafficstars

Hi I can run 1.3B using benchmark code here, but 2.7B is still not working (bad results) with the following params:

parser = argparse.ArgumentParser(description='H3 generation benchmarking')
parser.add_argument('--dmodel', type=int, default=2560) # 2048
parser.add_argument('--nlayer', type=int, default=32) # 24
parser.add_argument('--attn-layer-idx', type=list, default=[8, 16, 24]) # [8, 16]
parser.add_argument('--nheads', type=int, default=20) # 16
parser.add_argument('--ckpt', type=str, default='/fsx/BlinkDL/CODE/_PUBLIC_/H3/H3-2.7B/model-3attn.pt')
parser.add_argument('--promptlen', type=int, default=1024)
parser.add_argument('--genlen', type=int, default=128)
args = parser.parse_args()

BlinkDL avatar Jan 26 '23 13:01 BlinkDL

We're looking into this, stay tuned!

DanFu09 avatar Jan 28 '23 02:01 DanFu09

Thanks for the bug report, we've just fixed this. There was a mistake in the mapping between old and new parameter names that we've now fixed.

tridao avatar Jan 28 '23 16:01 tridao

Great. How abt the configuration for 125M and 355M

BlinkDL avatar Jan 29 '23 21:01 BlinkDL

Here are examples about how to load all the models, and example outputs: https://github.com/HazyResearch/H3/blob/main/examples/README.md

DanFu09 avatar Jan 30 '23 02:01 DanFu09