I made you a GUI with quantization and CPU options.
https://github.com/angrysky56/llada_gui
Thanks for your work~ We will try it.
later
I am not a coder but trying to set up some basic improvements. I tried to convert to onnx but my pc is too weak so far to convert let alone test. Got it about 10x faster and able to run on 12gb nvidia gpu.
The diffusion process in LLaDA offers unique opportunities for interaction that traditional autoregressive LLMs don't have. Some novel ways we could interact with and guide the diffusion process:
Novel Interactions with LLaDA's Diffusion Process
1. Guided Diffusion with External LLM Feedback
Create a system where Claude or another LLM provides real-time guidance during the diffusion process:
- At specific step intervals (e.g., every 20 steps), pause the diffusion
- Show the partially completed tokens to Claude
- Claude provides guidance on which tokens seem most promising
- Use this feedback to adjust token confidence scores before continuing
2. Multi-Model Token Verification
Implement a token verification pipeline:
- LLaDA proposes candidate tokens through diffusion
- Other models (like different size LLMs) vote on token quality
- Create an ensemble approach where tokens need consensus to be unmasked
- This could significantly improve accuracy by combining diffusion's creativity with autoregressive models' precision
3. Semantic Steering with Embeddings
Implement semantic guidance during diffusion:
- Define target semantic directions using embedding models
- As tokens are generated, compute their embeddings
- Apply gentle forces to guide the generation toward desired semantic spaces
- This could allow "steering" the response toward certain topics or styles
4. Interactive Diffusion Interface
Create an interface allowing direct human interaction during the diffusion process:
- Visualize tokens as they're being generated
- Allow users to "lock" good tokens they want to keep
- Let users "reject" tokens they don't want
- Enable users to provide hints for masked areas
- This turns generation into a collaborative process
5. Adaptive Masking Strategies
Implement more sophisticated masking strategies:
- Use contextual importance to determine mask scheduling
- Key structural tokens (like those in subject positions) remain masked longer
- Content tokens get resolved earlier
- This could improve coherence significantly
6. Cross-Modal Guidance
Use vision models to guide text diffusion:
- For topics with visual components, use image models to verify coherence
- Example: If generating text about "a red car on a bridge," verify that the combined tokens create embeddings similar to images of red cars on bridges
- This creates a subtle alignment between text and visual worlds
7. LoRA-Based Diffusion Control
Implement specialized LoRAs that work directly with the diffusion process:
- Train small LoRA adapters specifically for controlling diffusion dynamics
- These could provide stylistic control, domain-specific knowledge, etc.
- The LoRAs would directly influence which tokens get unmasked when
8. Dynamic Confidence Thresholding
Rather than using a fixed confidence threshold:
- Learn optimal unmasking thresholds for different token positions and contexts
- Early tokens might need higher confidence
- Later tokens could use lower thresholds once context is established
- This adaptively manages the diffusion process based on position
9. Memory-Augmented Diffusion
Integrate an external memory system:
- Store important context and knowledge in a vector database
- During diffusion, query this memory to guide token selection
- This adds long-context capabilities to LLaDA
- Perfect for document-grounded or knowledge-intensive tasks
10. Prototype Implementation
For a first implementation, I'd suggest starting with 1 or 4:
- For 1 (LLM guidance): Create a simple API connection between LLaDA and Claude
- For 4 (interactive interface): Extend the visualization tab to allow clicking on tokens
for 1. 2 is it possible to use other llada runned on second gpu/cpu,itselfs to check itselfs,also for errors and to turn back and re again(errors repair and quality up,if this was the logic here?)
Self-correcting technology :)
also option to fix its own weights to prevent founded errors
Self-quality improving technology.
it is very good if we have into gui quality tests gsm8... others...
CPU is much slower than GPU. I am not sure how the AI set up the GPU offloading but it is working pretty good. Possibly LoRA or much smaller LLaDA models could be run on CPU in tandem effectively on limited PC's.
I will see what I can do with this. https://github.com/EleutherAI/lm-evaluation-harness
Got sidetracked, I may have created "cognitive diffusion" with a vector db, but currently it is not hooked in to the process except as a demo. Integrating something like this: https://github.com/synthience/mcp-titan-cognitive-memory
edit: Cognitive Diffusion memory is now integrated, working though slower and more memory intensive. Posted to the repo, sorry the repo is a mess.
new version, was supposed to be a clean up, but now the vector db integration works to a degree at least. https://github.com/angrysky56/llada_gui_new
Edit: Prototype training now available to work on!