ColossalAI
ColossalAI copied to clipboard
Questions about log interpretation, seems paradoxical
This line of log confuses me, my batch size is 513 and iteration time is 98.83, so the throughput should be 5.19. Obviously, the logs of iteration time and throughput are contradictory, could someone tell me how to interpret it? thanks!
My training configuration:
# ColossalAI Version: v0.1.3
HIDDEN_SIZE = 2048
BATCH_SIZE = 513
NUM_EPOCHS = 1
SEQ_LEN = 2048
NUM_MICRO_BATCHES = 513
TENSOR_SHAPE = (BATCH_SIZE // NUM_MICRO_BATCHES, SEQ_LEN, HIDDEN_SIZE)
parallel = dict(
pipeline=3,
tensor=dict(mode='1d', size=1)
)
Hi, I believe there is some arithmetic error. Let's investigate into this problem 🔥
Hi @shjwudp , we use tqdm
to show such a progress bar, where 98.83s/it
is an average value presented by tqdm
, while throughput=41.595
is the value of the latest step presented by ColossalAI
. We also provide an average result after each epoch. Is that number also abnormal?
@FrankLeeeee @kurisusnowdeng Thanks for your response, it solved my confusion perfectly! BTW, I found gpt example has excellent scaling efficiency, but not good at computing performance. Under the same hyperparameter configuration and resource usage, DeepSpeed does 3x throughput of colossalai. This blows my mind. Do you have plan to improve the performance of the GPT example? I think a lot of people will be interested in this performance.
We also provide an average result after each epoch. Is that number also abnormal?
@kurisusnowdeng I haven't run a full epoch, but I have a task that will run an epoch tomorrow, then I'll sync my findings with you :)
We also provide an average result after each epoch. Is that number also abnormal?
@kurisusnowdeng I haven't run a full epoch, but I have a task that will run an epoch tomorrow, then I'll sync my findings with you :)
@shjwudp We are looking forward to your results.
@kurisusnowdeng The average result for Epoch is 32.005 which is closer to throughput than iteration time.
We have updated a lot. This issue was closed due to inactivity. Thanks.