chenyu

[email protected]

@tinygrad make the easy things easy, and the hard things possible

Results 80 issues of


                                            chenyu

llama2 70B OOM on tinybox without casting weight from bf16 to float16

1

comment

tinybox HSA supports bf16 buffer, so we can use the bfloat16 weight and don't need https://github.com/tinygrad/tinygrad/blob/639bd5dbfcef4e329b6fbba2de571c8cf70ee95b/examples/llama.py#L179-L180 commenting that out, `python3 examples/llama.py --gen 2 --size 7B --shard 6 --prompt "Hello." --count...

make bf16 training work

support changing default_float to bfloat16 and train cifar with bf16 - [x] bf16 Tensor creation and numpy #3724 #3747 - [x] bf16 rand #3764 - [x] move bf16 cast hack...

MLPerf Submission

training submission deadline is 5/10 https://docs.google.com/spreadsheets/d/1oeWuZ2GHb0r2d_v5gqUWVXhXVdTBTm0Qn_svyWYZPr4/edit#gid=538466233 submission rules https://github.com/mlcommons/policies/blob/master/submission_rules.adoc#42-single-submission-round-schedule

Pad resnet data for MLPerf

We drop the last incomplete batch now. For MLPerf eval it needs to use every data exactly once

Fix softmax.argmax issue

5

comment

argmax is finding max + finding argmax. The finding max part fused with softmax and introduced numerical error. The resulting max can be different from any of the tensor element,...

Symbolic 2.0

tracking issue, the goal is to cleanly separate sint used for symbolic shape, and the symbolic template used to general kernel indices. - sint does not need to include all...

Test passing Variables as size 1 Tensor instead of const pointer

check perf impact to see if it justifies the added complexity to support pointer arg

Implement python array API

3

comment

spec: https://data-apis.org/array-api/2022.12/API_specification/index.html tests: https://github.com/data-apis/array-api-tests

disable TC in action with amt=1

6

comment

open for investigation. `BEAM=2 python examples/handcode_resnet50_opt.py` on METAL M1 Max 1. master: 297ms 2. removed all TC: 206ms 3. keep only TC axis=0: 186ms

include negative float in test_dtype

2

comment

except casting negative float to unsigned integer which is undefined

‹
1
2
3
4
5
6
7
8
›