taher

Results 51 comments of taher

Looks like there are duplicate entries in the model file which is causing the token_to_id dictionary to have fewer unique entries. Total count for dupes is 97 which matches the...

> > Try narrowing that into a standalone C++ or Swift program. Does the fault still happen? > > Shame! it should be > > ``` > vDSP_vsmul(y, 1, (float*)...

as the current state of test_dtypes.py, `test_int8_to_uint8_negative` fails on Mac Intel x86 ```python Tensor([-1,-2,-3,-4], dtype=dtypes.int8).cast(dtypes.uint8) ``` the relevant generated Metal kernel: ```c++ kernel void E_4(device unsigned char* data0, const device...

How would the new `src` indexing work in cases where a 4D tensor is created, but `opt[0-2]` is being written? It would be out of bounds. for ex: https://github.com/ggerganov/ggml/blob/master/src/ggml.c#L7268

I was looking at the PR linked to this issue. So it seems like GGML_MAX_OPT needed to be increased

is this a VM running on Intel or Apple Silicon? your report says "hasUnifiedMemory: true" so, i'm assuming it's silicon?

it could be possible metal argument buffers aren't supported in a VM environment? a simple check could justify that: `device->argumentBuffersSupport()` device must be a tier2 to support argument buffers.

if its reporting 0 then its Tier1 and most probably arguments buffer apis won't work. well, it may work but there's just a lot of limitations

matrix factorizations aren't easy parallelizable on the gpu. would QR and SVD only have cpu implementation for now? @awni

note to self: almost all LAPACK routines are col-major @awni would Transpose on an mlx array before sending it to LAPACK routines work here, or is there an alternative way?