LLaDA I made you a GUI with quantization and CPU options.

https://github.com/angrysky56/llada_gui

Mar 02 '25 04:03 angrysky56

Thanks for your work~ We will try it.

Mar 03 '25 01:03 yyyouy

later

Mar 03 '25 03:03 jelspace

I am not a coder but trying to set up some basic improvements. I tried to convert to onnx but my pc is too weak so far to convert let alone test. Got it about 10x faster and able to run on 12gb nvidia gpu.

Mar 03 '25 23:03 angrysky56

The diffusion process in LLaDA offers unique opportunities for interaction that traditional autoregressive LLMs don't have. Some novel ways we could interact with and guide the diffusion process:

Novel Interactions with LLaDA's Diffusion Process

1. Guided Diffusion with External LLM Feedback

Create a system where Claude or another LLM provides real-time guidance during the diffusion process:

At specific step intervals (e.g., every 20 steps), pause the diffusion
Show the partially completed tokens to Claude
Claude provides guidance on which tokens seem most promising
Use this feedback to adjust token confidence scores before continuing

2. Multi-Model Token Verification

Implement a token verification pipeline:

LLaDA proposes candidate tokens through diffusion
Other models (like different size LLMs) vote on token quality
Create an ensemble approach where tokens need consensus to be unmasked
This could significantly improve accuracy by combining diffusion's creativity with autoregressive models' precision

3. Semantic Steering with Embeddings

Implement semantic guidance during diffusion:

Define target semantic directions using embedding models
As tokens are generated, compute their embeddings
Apply gentle forces to guide the generation toward desired semantic spaces
This could allow "steering" the response toward certain topics or styles

4. Interactive Diffusion Interface

Create an interface allowing direct human interaction during the diffusion process:

Visualize tokens as they're being generated
Allow users to "lock" good tokens they want to keep
Let users "reject" tokens they don't want
Enable users to provide hints for masked areas
This turns generation into a collaborative process

5. Adaptive Masking Strategies

Implement more sophisticated masking strategies:

Use contextual importance to determine mask scheduling
Key structural tokens (like those in subject positions) remain masked longer
Content tokens get resolved earlier
This could improve coherence significantly

6. Cross-Modal Guidance

Use vision models to guide text diffusion:

For topics with visual components, use image models to verify coherence
Example: If generating text about "a red car on a bridge," verify that the combined tokens create embeddings similar to images of red cars on bridges
This creates a subtle alignment between text and visual worlds

7. LoRA-Based Diffusion Control

Implement specialized LoRAs that work directly with the diffusion process:

Train small LoRA adapters specifically for controlling diffusion dynamics
These could provide stylistic control, domain-specific knowledge, etc.
The LoRAs would directly influence which tokens get unmasked when

8. Dynamic Confidence Thresholding

Rather than using a fixed confidence threshold:

Learn optimal unmasking thresholds for different token positions and contexts
Early tokens might need higher confidence
Later tokens could use lower thresholds once context is established
This adaptively manages the diffusion process based on position

9. Memory-Augmented Diffusion

Integrate an external memory system:

Store important context and knowledge in a vector database
During diffusion, query this memory to guide token selection
This adds long-context capabilities to LLaDA
Perfect for document-grounded or knowledge-intensive tasks

10. Prototype Implementation

For a first implementation, I'd suggest starting with 1 or 4:

For 1 (LLM guidance): Create a simple API connection between LLaDA and Claude
For 4 (interactive interface): Extend the visualization tab to allow clicking on tokens

Mar 04 '25 06:03 angrysky56

for 1. 2 is it possible to use other llada runned on second gpu/cpu,itselfs to check itselfs,also for errors and to turn back and re again(errors repair and quality up,if this was the logic here?)

Self-correcting technology :)

also option to fix its own weights to prevent founded errors

Self-quality improving technology.

Mar 04 '25 07:03 jelspace

it is very good if we have into gui quality tests gsm8... others...

Mar 04 '25 08:03 jelspace

CPU is much slower than GPU. I am not sure how the AI set up the GPU offloading but it is working pretty good. Possibly LoRA or much smaller LLaDA models could be run on CPU in tandem effectively on limited PC's.

I will see what I can do with this. https://github.com/EleutherAI/lm-evaluation-harness

Mar 04 '25 18:03 angrysky56

Got sidetracked, I may have created "cognitive diffusion" with a vector db, but currently it is not hooked in to the process except as a demo. Integrating something like this: https://github.com/synthience/mcp-titan-cognitive-memory

edit: Cognitive Diffusion memory is now integrated, working though slower and more memory intensive. Posted to the repo, sorry the repo is a mess.

Mar 05 '25 05:03 angrysky56

new version, was supposed to be a clean up, but now the vector db integration works to a degree at least. https://github.com/angrysky56/llada_gui_new

Edit: Prototype training now available to work on!

Mar 07 '25 08:03 angrysky56