Michael Kowalski comments

Results 41 comments of


                                            Michael Kowalski

trafficstars

[Feature Request] Allow StaticIntTuple mask for SIMD.shuffle

fixed by #2315

[mojo-stdlib] Create `InlineArray` type

There is a problem with how `__refitem__` provides `__getitem__` when `InlineArray` is used as an `alias`. Maybe since in this case `self` is immutable in `__refitem__`? A failing test to...

[Feature Request] More precise assertation failure information need to be provided

testing.assert_... functions now print the file and line number when there is a failure. See this [commit](https://github.com/modularml/mojo/commit/52ceff7b986d6022fd8c82a49702729b176de0d3). So I think this issue can be closed.

[BUG] Parameter closure captures are unsafe references

[Parameter closure captures are unsafe references](https://docs.modular.com/mojo/roadmap#parameter-closure-captures-are-unsafe-references).

[stdlib] Refactor slice to use Optionals and fix negative step slices

Could these tests using out of bounds values be added in `test_string.mojo`? There should probably be similar tests in `test_list.mojo`. The new `Slice.adjust()` itself has no tests yet. ``` assert_equal("",...

Upgrade torch and correct dim mismatch

The shape mismatch errors should be fixed in the models and not in the tests. The `_register_load_state_dict_pre_hook()` logic is not correct. Currently it is only applied in the base DistilBert...

Optimized rope_rotation_llama and apply temperature to logits with vectorization

I vectorized ROPE as you have a while back but the networks were much too small to see any impact. I also did micro benchmarks focused on the ROPE function...

Optimized rope_rotation_llama and apply temperature to logits with vectorization

I think this will get a performance improvement by removing the parallelize call of the loop over heads in ROPE and replacing with a simple for loop. Setting up threads...

Optimized rope_rotation_llama and apply temperature to logits with vectorization

On M1 Pro removing parallelize is better. Not a huge difference in the whole network but clearly better. In isolated benchmarks at the size of the baby llama models parallelizing...

Optimized rope_rotation_llama and apply temperature to logits with vectorization

I put the comparisons I did in this [branch](https://github.com/mikowals/llama2.mojo/tree/no-parallelize-rope). The graphs above are done where "V1 is current master and V2 is with the two line change to remove parallelize...