compilade
compilade
@mofosyne I agree with @Galunid regarding the overhead (both CPU-wise and memory-wise). This also has the exact same problems as the UUID autogeneration, because the hash for an `f32` model...
@mofosyne > That's a bit strange, does llama-gguf-hash also show difference? No, the difference is only in the metadata, because of the hash introduced in this PR which differs, because...
Some progress report: I have a local branch (not yet public) on top of #8526 in which I've started implementing the graph for Mamba-2. The conv step is very similar...
Okay, the fully recurrent mode works for `Mamba-2`! (for the curious, see this branch: ) I'll open a PR soon (in the next days; still need to clean up some...
Heads up that #15625 fixes a problem in the implementation of `SSM_SCAN`, which makes this model (Mamba-Codestral-7B-v0.1) better than it was when initially implemented here. So if you had some...
> @compilade I recall you had an observation about potential issues with autogenerating uuids @mofosyne Yes, there are possible problems. - Should the UUID of a model be the same...
@mofosyne Hashing the *source tensors* could work without making the memory usage too high (because they are `mmap`-ed), and would also solve the other equivalence problems, since the semantic of...
> you mean like `generate_source_tensors_uuid()` in this? @mofosyne Yes, pretty much. This reads the whole source tensors twice (so it's slow), but I don't really see a way around that...
Author of #8526 here. > Why can this happen? Basically it should not happen if it worked before. It's possible the internal changes in batch splits caused some external changes...
> Thank you for the answer! Changing default idx to `-1` helps, the error no longer occurs. That is very good to know! > It looks like the feature was...