moskewcz
moskewcz
well, assuming i didn't mess up the analysis, and used the right inputs/etc, a runtime of 0.146s on the (non-intel) alexnet-owl prototxt you linked above, for a batch of 128...
again, if i got my #s right, if we assume 70M images (~65 epochs \* 1.1M images/epoch, not sure if that's a good value or not) in 5 days to...
sounds like a plan. make sure you fire up nvidia-smi while you're running it ... ;)
hmm, well, i was mostly joking and i mostly believe you. however, i'm not sure that what you say precludes the GPU being active. in fact, if, say, the new...
@ozabluda i think your analysis of the intel #s looks good and is believable. as per an above comment, we're guessing ~2.65TFLOPs peak for the dual-socket 36-core machine intel used...
@ozabluda hmm, i'm not sure what you changed, but i guess it looks more/differently wrong to me now, still as per my (1) and (2). AFAIK all the caffe timings...
remember that add_codegen_annotations() is just a slightly generalized version of add_cnn_codegen_annotations(), which is already called for all the operations in the input graph in rtc_fwd.cc here: https://github.com/moskewcz/boda/blob/master/src/rtc_fwd.cc#L479 so, it doesn't...
in short, yes, that's the general idea. but, what you've sketched is (in some ways) 'the easy part' -- the harder part is making sure, in general, that proper transformations...
i'm not sure what you're asking or where you came up with that command line; can you explain your idea/reasoning?
okay, i think i understand your question somewhat. 1) i can't think of a fundamental reason that input/output wisdom files would be needed for profiling. but, i'd also not be...