M. Yusuf Sarıgöz
M. Yusuf Sarıgöz
Unfortunately that theoretically small divergence between GELU and Quick GELU lead to large differences at the end, I suppose it accumulates through 12 layers. So I couldn't get good results...
The failing test is `test-grad0`, which is [failing also in master](https://github.com/ggerganov/ggml/actions/runs/5315086358/jobs/9623050487) due to a timeout.
Unfortunately it didn't work. It first increased the memory requirement for the computation buffer, and when I allocated the required memory the NaN issue kicked back. But I believe that...
I think it's more related to the kernel data (`src0`) not prepared in `wdata` unlike `src1` --trying to understand the memory layout there.
Thanks, I'll dig deeper into it later on. Now that this is merged, I'll raise a PR to add a link to clip.cpp shortly.
> rename ggml_graph_compute_make_plan() to ggml_graph_plan() I would suggest `ggml_cplan_make()` --both short as intended and also consistent with the struct naming.
I'm afraid defining a closed set of metadata vocabulary might be a restricting design that hinders the speed of innovations in the GGML community. My suggestion would be define a...
I'm surprised by `ggml_norm`. It works in the feature dimension, e.g., `ne00`, as it should, but it gives a different result for the second one of two identical samples in...
> Are you accumulating the sum into a double? Yes it's just for debugging so I accumulate the sum to a double. And the actual issue is, output vectors for...
Yes, it turned out to be that I calculated a wrong value for the offset to be added to the `wdata` pointer in `conv_2d_sk_p0`. So even if input 0 gives...