Yahya Lmallas
Yahya Lmallas
> There's functional changes in here mixed with others, which makes it hard to read and review. > > * add types (fine) > * rename variables (why?) > *...
> `x[::-1]` vs. `reversed` > > ``` > X = [1, 2, 3, 4, 5, 6, 7, 8] > > def f1(x): > return sum(reversed(x)) > > def f2(x): >...
> Overall looks pretty good. > > I bet the core gains come from breaking out and caching the helper functions, I think that's a win for both speed and...
as per this docs here https://ece.northeastern.edu/groups/nucar/Analogic/Class5-D-Extensions.pdf we need : #pragma OPENCL EXTENSION cl_khr_fp64 : enable (Optional) and #pragma OPENCL EXTENSION cl_amd_fp64 : enable (Required for AMD)
this is what ceil should be: ``` def ceil(self: Tensor) -> Tensor: b = self.cast(dtypes.int32).contiguous() return ((self - b) > 0).where(b + 1, b).cast(self.dtype) ``` but this returns default input_tensor...
many tests work on local and fails on CI and it appears that is an intel related problem not reproducible on Apple Silicon
can you update the falling tests?
here we go: https://developer.apple.com/metal/Metal-Feature-Set-Tables.pdf apple is not clear about Metal3 GPU families bellow in this table : **AMD Vega AMD 5000-series, 6000-series Intel UHD Graphics 630 Intel Iris Plus Graphics**...
> Still need to test other backends, only tested with OpenCL. CI for OpenCL fails because `91:1:26: warning: unsupported OpenCL extension 'cl_khr_fp16' - ignoring` check if this works ``` #ifdef...
Why not it is pretty much cooler