pal-mcp-server icon indicating copy to clipboard operation
pal-mcp-server copied to clipboard

feat: Add Moonshot Kimi, X.AI Grok-4 and Groq model support

Open Detrol opened this issue 9 months ago β€’ 12 comments

Description

This PR expands model support by adding three new AI providers to the Zen MCP Server ecosystem, significantly increasing available model options and performance capabilities.

Changes Made

  • Added Moonshot Kimi provider with support for:

    • kimi-latest (200K context) - Latest Kimi model for general use
    • kimi-thinking-preview (200K context) - Extended thinking capabilities
    • Aliases: moonshot-latest, moonshot-thinking, kimi-thinking
  • Enhanced X.AI provider with Grok-4 model support:

    • grok-4 (256K context) - Latest generation with multimodal capabilities
    • 100x more training data than Grok-2
    • Enhanced reasoning and visual understanding
    • Alias: grok4
  • Added comprehensive Groq provider with ultra-fast LPU technology:

    • Production Models: Gemma2 9B, Llama 3.1/3.3, Llama Guard 4
    • Preview Models: DeepSeek R1, Llama 4 Maverick/Scout, Mistral Saba, Kimi K2, Qwen 3
    • Preview Systems: Compound Beta/Mini
    • Compatibility fix: Removed incompatible prompt guard models (require single user messages)
  • Updated provider registry with proper API key mappings and priority ordering

  • Enhanced model restrictions support for all new providers

  • Updated documentation including README.md and .env.example with setup instructions

  • Added comprehensive aliases for improved user experience

Testing

  • 37 new unit tests covering all providers and models
  • Integration tests for model validation and capabilities
  • Comprehensive coverage of aliases, restrictions, and error handling
  • Real API compatibility verified for all providers
  • Code quality checks passing (ruff, black, isort)
  • All existing tests continue to pass

Related Issues

Addresses community request for Groq integration and expands model availability options.

Checklist

  • [x] PR title follows the format guidelines
  • [x] Ran ./code_quality_checks.sh (all checks passed 100%)
  • [x] Self-review completed
  • [x] Tests added for ALL changes
  • [x] Documentation updated as needed
  • [x] All unit tests passing
  • [x] Relevant simulator tests passing (if tool changes)
  • [x] Ready for review

Additional Notes

  • Groq models provide ultra-fast inference (200+ tokens/sec) with LPU technology
  • Model restrictions configuration examples included for cost control
  • Backward compatibility maintained for existing configurations
  • Note: mistral-saba-24b may require terms acceptance at Groq console

Configuration example:

MOONSHOT_API_KEY=your_moonshot_api_key_here
XAI_API_KEY=your_xai_api_key_here
GROQ_API_KEY=your_groq_api_key_here

Detrol avatar Jul 16 '25 06:07 Detrol

@Detrol if you add kimi, please add the option to use groq ☺️ πŸ™

michabbb avatar Jul 16 '25 13:07 michabbb

@Detrol if you add kimi, please add the option to use groq ☺️ πŸ™

Groq is neither a provider nor a model.

Detrol avatar Jul 16 '25 13:07 Detrol

Groq is neither a provider nor a model.

okay, what is it then ? enlighten me

michabbb avatar Jul 16 '25 13:07 michabbb

Groq

I admit i spoke too soon, i hadn't heard of it nor read up on it properly. I see now it's similar to openrouter. I will check it out.

Detrol avatar Jul 16 '25 13:07 Detrol

I admit i spoke too soon, i hadn't heard of it nor read up on it properly. I see now it's similar to openrouter. I will check it out.

Okay, welcome to the world of 200 tokens per second. If you add Kimi β€” which is great, by the way β€” I just wanted to point out that even though there are multiple "providers" out there, I'm pretty sure most people prefer the fastest one. Even though Kimi is still in beta on Groq, it already works really well... and the speed is incredible.

That being said, adding "official" providers is nice, but in this case it's absolutely worth adding Groq (yes, it really is a provider 😏). And if you've never used it β€” holy moly, you're missing out. Same goes for SambaNova or Cerebras.

Anyway, maybe there's a way to let the user choose which provider should be used for Kimi (or any other model). I don't know the details of the code here, but I wanted to bring it up before the Kimi integration gets finalized.

Thanks!

michabbb avatar Jul 16 '25 13:07 michabbb

I admit i spoke too soon, i hadn't heard of it nor read up on it properly. I see now it's similar to openrouter. I will check it out.

Okay, welcome to the world of 200 tokens per second. If you add Kimi β€” which is great, by the way β€” I just wanted to point out that even though there are multiple "providers" out there, I'm pretty sure most people prefer the fastest one. Even though Kimi is still in beta on Groq, it already works really well... and the speed is incredible.

That being said, adding "official" providers is nice, but in this case it's absolutely worth adding Groq (yes, it really is a provider 😏). And if you've never used it β€” holy moly, you're missing out. Same goes for SambaNova or Cerebras.

Anyway, maybe there's a way to let the user choose which provider should be used for Kimi (or any other model). I don't know the details of the code here, but I wanted to bring it up before the Kimi integration gets finalized.

Thanks!

Sounds great! I will be adding it because I'm really intrigued. It's possible to decide which models are used from which provider in the .env file already by defining *_ALLOWED_MODELS, for the specific provider.

Detrol avatar Jul 16 '25 14:07 Detrol

It's possible to decide which models are used from which provider in the .env file already by defining *_ALLOWED_MODELS, for the specific provider.

iΒ΄m not sure if that works. because as i see it:

GOOGLE_ALLOWED_MODELS=flash,pro

it says: which provider is allowed to use which model

but in this case i was more thinking about something:

IF the user wants to use MODEL XYZ, always choose provider XYZ

😏

michabbb avatar Jul 16 '25 14:07 michabbb

It's possible to decide which models are used from which provider in the .env file already by defining *_ALLOWED_MODELS, for the specific provider.

iΒ΄m not sure if that works. because as i see it:

GOOGLE_ALLOWED_MODELS=flash,pro

it says: which provider is allowed to use which model

but in this case i was more thinking about something:

IF the user wants to use MODEL XYZ, always choose provider XYZ

😏

Just disallow the model from x provider, so only y provider can use it. Then it should pick that provider automatically.

Detrol avatar Jul 16 '25 14:07 Detrol

@Detrol Thanks for doing this-- I cloned your repo locally to use Moonshot, but had to run the dos2unix util on the run-server.sh to replace Windows line break characters in order to have it properly run on my Mac. Not sure if this is on your side or not, but thought it was worth noting.

tayiorbeii avatar Jul 23 '25 21:07 tayiorbeii

Thank you! Grok-4 was added earlier today, can you please merge / resolve conflicts and I'll look at this again.

guidedways avatar Aug 08 '25 05:08 guidedways

Who is in charge here? What are we waiting for?

michabbb avatar Aug 08 '25 07:08 michabbb

GLM (z.ai) πŸ™

SWSAmor avatar Nov 23 '25 17:11 SWSAmor