Diego Devesa comments

Results 361 comments of


                                            Diego Devesa

ggml_graph_compute: deprecate using ggml_context, try resolve issue #287

>ggml_graph_compute_plan() MUST be called because it also sets node->n_tasks. The work_size depends on n_tasks. I think that `n_tasks` should be removed from `ggml_tensor`. For now, the easiest way to address...

ggml_graph_compute: deprecate using ggml_context, try resolve issue #287

>Of course, `n_tasks` should belong to the compute facility I think, it's ideal to migrate to some place else. Yes precisely! That's what I was thinking as well. I am...

ggml_graph_compute: deprecate using ggml_context, try resolve issue #287

That's very similar to what I have been thinking. I am working on a CUDA implementation that can execute `ggml_cgraphs` directly, and what it needs to do that is very...

ggml_graph_compute: deprecate using ggml_context, try resolve issue #287

What I was thinking is that `n_threads` could be a parameter to `ggml_graph_compute_plan`, and it would also be stored in `ggml_cgraph_context` for use by `ggml_graph_compute`. For now, the CUDA runner...

ggml_graph_compute: deprecate using ggml_context, try resolve issue #287

Looks good, I only have a few minor nits: - In llama.cpp, to avoid allocations in every eval, the work buffer memory could be stored as a `std::vector` in `lama_context`....

ggml_graph_compute: deprecate using ggml_context, try resolve issue #287

I think this looks good. >A positive side-effect is that the user can now control the number of tasks for each op. This can be utilized also when creating custom...

GGUF file format specification

The LoRA files are very simple currently, it's just a tiny header with a few parameters and a bunch of tensors. I think it should work fine with the way...

Fix #292

Not sure if @ggerganov agrees, but I think that the best way to do this may be a simple macro that has all the variables for the 3 tensors src0/src1/dst,...

Send/receive operators via MPI

I think there is some overlap between this and the plan to implement mixed CPU/GPU evaluation in llama.cpp by splitting the graph in multiple parts and running each of them...

Send/receive operators via MPI

The idea is not to make the splits automatically, the programmer will still need to choose where to make these splits, and the user will need to specify what backend...