Daniel Chalef

Results 181 comments of Daniel Chalef

JAX is not yet fully supported on the Apple Silicon GPU. See here: https://github.com/google/jax/issues/8074#issuecomment-1148982985 You may get it work -- very slowly -- using the CPU version of GPU.

All DALL-E models (the original Open AI DALL-E, mini, mega, mega-fp16) output the same resolution images. They're all based on the same model architecture. Open AI's DALL-E 2 model architecture...

I was able to get the quantized `dalle-mini/dalle-mini/mega-1-fp16:latest` model to load into 12GB VRAM alongside VQGAN using the modifications I made to `app.py`. See my PR https://github.com/saharmor/dalle-playground/pull/26

Apple's M1 appears to offer Intel AMX-like capabilities accessed via an arm64 ISA extension. This extension is in use by Apple's own Accelerate framework. Prototype code for using these matrix...

> Thats not prototype code, nor intrimsic header of sorts, that is an earlu attempt to document an undocumented co-processor. Ulp. You're right. I scanned the gist too quickly. That's...

> Uh thanks, a deep link into a gist that looks as if it was supposed to be private, and has comments about being reverse-engineered from Apple's intellectual property ?...

I've run the benchmark suite on OpenBLAS (develop branch) compiled for: - arm64 / VORTEX with LLVM shipped with Big Sur - x86_64 using homebrew's gcc-10 toolchain and run using...

> This version of benchmark/bench.h would probably work for OSX: > [bench.h.txt](https://github.com/xianyi/OpenBLAS/files/5754961/bench.h.txt) I get the following when making the tests with the modified `bench.h`: ```In file included from gemm.c:28: ./bench.h:82:21:...

> right, @danielchalef can you move that line `mach_timebase_info(&info);` into the getsec() function immediately after the > `#elif defined(__APPLE__)` there, please ? The tests compiled. However, the math now appears...

@martin-frbg Your suggestion to set `OPENBLAS_LOOPS` to a larger number works. I'll upload `dgemm` results later today.