Devin Matthews
Devin Matthews
@marcinz please try 7d979b6.
During normal computation, the only global locking is done when checking out a block of memory from the global pool. This should scale roughly the same as if you were...
Basically laziness: I didn't want to go to the trouble of dealing with complex types spanning C++/C99/C89 and was just looking for something I could easily deal with on the...
And BTW thanks for the questions, I appreciate your interest in the project.
I went ahead and switched to a union and added some convenience functions with c888fcb109d31594d0c51dd374f71de682799ed6. Example: ```C tblis_scalar s; tblis_vector v; s.data.d = 3.0; tblis_init_vector_s(&v, 100, some_float_ptr, 1); ```
This looks good, but if you don't mind a bit of extra elbow grease, it would be great to tweak things such that we can drop in the cpuid code...
@eddy16112 TBLIS doesn't currently support PPC/POWER. In the near future TBLIS will work off of the [BLIS framework](https://github.com/flame/blis) which will allow support for x86_64, arm32, aarch64 and POWER.
Within 3 months or so is my best estimate.
> Having a clear interface and arch detection makes sense indeed, however without proper tuning, mergers/reviewers might not see this as a priority. Just guessing. "The establishment" here. @everton1984 thanks...
The block sizes can, to some extent, be determined analytically, see https://www.cs.utexas.edu/users/flame/pubs/TOMS-BLIS-Analytical.pdf. A basic non-analytical strategy is: 1. Run a series of problems with m=MR, n=NR, and increasing k. Note...