ybsh

Results 26 comments of ybsh

@LWisteria I ran Chainer's ```train_word2vec.py``` on titanv with the default epochs/iterations (i.e. the same as the [performance report](https://github.com/fixstars/clpy/wiki/chainer_example_performance_report)), but the execution times were much longer than the reported figures even...

@LWisteria Maybe no for right now, but maybe yes in the long run. I thought it was a good thing to find hidden configuration differences (if there are) for assuring...

Profiled ```train_word2vec.py``` with cProfile on titanv, epoch = 1. ### CuPy Execution time: 5.791 s - top 10 ```tottime``` ``` ncalls tottime percall cumtime percall filename:lineno(function) 1004 4.133 0.004 4.156...

Code location: https://github.com/fixstars/clpy/blob/3a97570b87ad57e85e837678c863b62cda6e4513/clpy/core/core.pyx#L2153 Now I'm going on to Cython-level profiling.

With a Cython-level profiling, it turned out to be ```copy_from_host()``` that ```train_word2vec.py``` was spending most of its time in ClPy. Again a Cython output, sorted by cumulative execution time: ```...

Thank you. Perhaps it is time to turn to OpenCL tracing tools, because it seems very hard to find out what is happening inside only with Cython profiling tools...

@LWisteria OK, I will try that and see if the transfers are taking long or not.

Added ```clFinish()``` before the ```clEnququeWriteBuffer``` in ```memory.pyx:354(copy_from_host)```. As @LWisteria has pointed out its possibility, ```clEnququeWriteBuffer``` was almost idle waiting for previous tasks 😇 ``` ncalls tottime percall cumtime percall filename:lineno(function)...

Now that the data transfers proved not to be the culprit, I'll try OpenCL tracing tools.

I'm trying Python's [```trace``` module](https://docs.python.org/2/library/trace.html#cmdoption-trace-trackcalls).