Amey Agrawal
Amey Agrawal
@webvictim this would be a great feature to have, can we merge this if possible?
My initial conclusion was wrong. I had been running different configurations on g2.8xlarge and p2.8xlarge so that that the model could fit on the smaller cards K520. But strangely it...
The one with the batch-norm works the other doesn't. Further, digging down it seems that only the first batch normalisation layer is important for the network to work. I tried...
I guess [this](https://drive.google.com/a/bits-pilani.ac.in/file/d/0BwyWFeou7mr2cTdlcF9zV253NGs/view?usp=sharing) should sort the problem. Also the logo now weighs 5kbs instead of 44kbs.
Sorry for inconvenience,I have fixed the link.
@VictorSanh can you please review this PR?
I am creating a PR for the same.
https://github.com/huggingface/knockknock/pull/62
@simon-mo the sarathi fork also has extensive metric logging framework if that is of interest - https://microsoft-research.wandb.io/msri-ai-infrastructure/llm-simulator-v2/reports/Sarathi-Benchmark-Suite-Demo--VmlldzoyNDMx?accessToken=d81jj8r843ntfhjle51uac1y57jvm80urmizil5rxt9jcafqnd1eib5swevpfejx
Let me know if you want us to create a PR for this