ybsh comments

Results 26 comments of


                                            ybsh

Why ClPy is slower than CuPy even if on the same machine?

@LWisteria I ran Chainer's ```train_word2vec.py``` on titanv with the default epochs/iterations (i.e. the same as the [performance report](https://github.com/fixstars/clpy/wiki/chainer_example_performance_report)), but the execution times were much longer than the reported figures even...

Why ClPy is slower than CuPy even if on the same machine?

@LWisteria Maybe no for right now, but maybe yes in the long run. I thought it was a good thing to find hidden configuration differences (if there are) for assuring...

Why ClPy is slower than CuPy even if on the same machine?

Profiled ```train_word2vec.py``` with cProfile on titanv, epoch = 1. ### CuPy Execution time: 5.791 s - top 10 ```tottime``` ``` ncalls tottime percall cumtime percall filename:lineno(function) 1004 4.133 0.004 4.156...

Why ClPy is slower than CuPy even if on the same machine?

Code location: https://github.com/fixstars/clpy/blob/3a97570b87ad57e85e837678c863b62cda6e4513/clpy/core/core.pyx#L2153 Now I'm going on to Cython-level profiling.

Why ClPy is slower than CuPy even if on the same machine?

With a Cython-level profiling, it turned out to be ```copy_from_host()``` that ```train_word2vec.py``` was spending most of its time in ClPy. Again a Cython output, sorted by cumulative execution time: ```...

Why ClPy is slower than CuPy even if on the same machine?

Thank you. Perhaps it is time to turn to OpenCL tracing tools, because it seems very hard to find out what is happening inside only with Cython profiling tools...

Why ClPy is slower than CuPy even if on the same machine?

@LWisteria OK, I will try that and see if the transfers are taking long or not.

Why ClPy is slower than CuPy even if on the same machine?

Added ```clFinish()``` before the ```clEnququeWriteBuffer``` in ```memory.pyx:354(copy_from_host)```. As @LWisteria has pointed out its possibility, ```clEnququeWriteBuffer``` was almost idle waiting for previous tasks 😇 ``` ncalls tottime percall cumtime percall filename:lineno(function)...

Why ClPy is slower than CuPy even if on the same machine?

Now that the data transfers proved not to be the culprit, I'll try OpenCL tracing tools.

Why ClPy is slower than CuPy even if on the same machine?

I'm trying Python's [```trace``` module](https://docs.python.org/2/library/trace.html#cmdoption-trace-trackcalls).