kodonnell comments

Results 55 comments of


kodonnell

Lossy QOI Variant

FYI it was suggested that https://github.com/phoboslab/qoi/issues/145 might be more appropriate here. Wouldn't mind someone checking the results, as they were pretty compelling, especially as it's "free".

Lossy QOI Variant

I'm afraid I don't have any such images - can you send some through? I was going to use the python image similarity package which supports other metrics, but didn't...

Sub-optimal performance on Qualcomm Adreno GPUs

My intel results from this change (if relevant) are [here](https://github.com/CNugteren/CLBlast/issues/257#issuecomment-383410884).

New tuning results

Some more results. Note that beignet (which I used) is 10-20% slower than Intel NEO. [Intel(R) HD Graphics 6000 BroadWell U-Processor GT3.zip](https://github.com/CNugteren/CLBlast/files/1959410/Intel.R.HD.Graphics.6000.BroadWell.U-Processor.GT3.zip)

Can we tune faster?

> Typically running the kernel itself is the least amount of the total tuning time. For quick kernels, yes. Some of the kernels were reporting 30 second runtimes, if I...

Can we tune faster?

> A quick search ... But that does hint at out a solution (though, as above, it'd involve rewriting the kernels). > So unless someone has a solution, I'm afraid...

Can we tune faster?

Also - I did originally intend this issue to include other ideas, not just my own, so I'll leave this open. (You're welcome to close it though.)

Re-thinking the auto-tuning

Regarding tuning, have you considered doing this in python? Performance shouldn't (?) be a problem, as the underlying kernels are the same. It's arguably going to reach a wider audience,...

Re-thinking the auto-tuning

Following up that comment - you could also consider not doing the tuning directly, but allowing users to implement their own searches (though you'd probably have some default ones implemented)....

Re-thinking the auto-tuning

Cool - no problem. > Not sure how that will work. I was more referring to a suspicion that most users interested in OpenCL will be in machine learning etc.,...