Philip Turner

Results 340 comments of Philip Turner

> but sadly I didn't observe any power consumption by the ANE module, which is too bad. The neural engine can't be used for training anyway. It only supports Float16,...

From https://github.com/pytorch/pytorch/issues/77753#issuecomment-1132230314, the GPU may be slower because it spends more of its time writing to new memory than actually executing. Also, since it continues writing to new memory on...

I do have a workaround planned for my own framework ([s4tf/s4tf](https://github.com/s4tf/s4tf)) where I will bypass this restriction. It's described in https://github.com/AnarchoSystems/DeepSwift/issues/1#issuecomment-1129891796, although please don't comment on that thread. @albanD does...

I wonder if we could convert this to actual MPS code in Swift, then profile how long that code takes. That would determine whether the bottleneck is PyTorch’s fault and...

This might be why it’s so slow. Worth reading if you have the time. https://discuss.pytorch.org/t/sequential-throughput-of-gpu-execution/156303

The spike in microsecond-level overhead (CPU time avg) was discussed [here](https://github.com/pytorch/pytorch/issues/82707#issuecomment-1204672455). I think I’ve found a solution to it, but haven’t put it into practice with an RNN.

> Any recent plan to implement it? I’m not planning to implement it in PyTorch; however, it’s open source and I’ve explained it in great depth. Someone else could look...

Regarding a working RNN implementation, it probably won't happen any time within the next few weeks. I'm juggling a bunch of other projects simultaneously, so things will happen slowly regarding...

That strongly suggests it's a driver overhead bottleneck. The CPU takes 1/4 as long when you go from 1000 to 250. Perhaps it should take 1/16 ((1/4)^2) as long in...

Based on my experiments, driver overhead can be reduced significantly from a naive implementation. In the best case, there is a 100x reduction. In the average case, there is a...