build(deps): upgrade to LangChain 1.x and resolve transitive NumPy 2.0+ conflicts
🚀 Summary
This PR modernizes the project's dependency graph to enable full integration with the LangChain 1.x ecosystem. It addresses critical "dependency hell" issues caused by upstream transitive dependencies (specifically verl and TransferQueue) incompatible with NumPy 2.0+.
Additionally, this PR refactors the internal agent architecture to align with updated LangChain APIs, replacing deprecated usage with modern patterns (LangSmith, LangGraph).
🛠️ Dependency Resolution Strategy
The Problem: Conflicting Requirements
The environment upgrade was blocked by strict version pinning in upstream dependencies:
| Component | Requirement | Status |
|---|---|---|
| LangChain 1.x | numpy >= 2.0.0 |
Target |
TransferQueue (via verl) |
numpy < 2.0.0 |
❌ Conflict |
The Solution: Source Overrides & Modernization
Since we cannot wait for upstream releases, I have applied a temporary Source Override strategy to unblock the build.
- Upstream Modernization: Forked
TransferQueueandverl, migrating their build systems topyproject.toml(PEP 621). - NumPy 2.0 Compatibility: Relaxed
numpyconstraints in theTransferQueuefork after verifying API usage. - Conflict Elimination: Removed the problematic
sglangoptional dependency group from theverlfork.
Implementation Details
The pyproject.toml has been updated to use [tool.uv.sources] pointing to the patched forks:
# Changes in agent-lightning/pyproject.toml
[tool.uv.sources]
transferqueue = { git = "[https://github.com/Acture/TransferQueue.git](https://github.com/Acture/TransferQueue.git)"}
verl = { git = "[https://github.com/Acture/verl.git](https://github.com/Acture/verl.git)"}
Safety Check: analysis confirms TransferQueue uses standard NumPy APIs. It does not rely on C-API structures changed in NumPy 2.0, ensuring ABI compatibility.
⚡ Codebase Modernization (LangChain 1.x)
Apart from dependency fixes, the following architectural changes were made to support LangChain 1.x:
LangSmith Integration: Replaced deprecated langchain hub usage with langsmith.Client.
Agent Construction: Switched from the legacy create_react_agent to the more flexible create_agent pattern; updated prompt handling accordingly, alsoupdated calculator tool usage.
⚠️ Known CI/CD Limitations
Expect the regression test suite to fail.
Since this PR updates core LangChain prompt templates (e.g., switching to langsmith hub patterns), the request signatures have changed. The existing mock server cache is now invalidated, resulting in the following error during integration tests:
ValueError: No suitable cached response found. Please ensure the prompt caches are populated.
Root Cause:
The Mock/Caching layer cannot find a matching response key for the modernized prompt structure.
Resolution:
This is not a code error. To resolve this, the test suite must be run in record mode against a live LLM endpoint to regenerate the cache. (Blocked by missing credentials in the local/current CI environment).
@Acture please read the following Contributor License Agreement(CLA). If you agree with the CLA, please reply with the following information.
@microsoft-github-policy-service agree [company="{your company}"]Options:
- (default - no company specified) I have sole ownership of intellectual property rights to my Submissions and I am not making Submissions in the course of work for my employer.
@microsoft-github-policy-service agree
- (when company given) I am making Submissions in the course of work for my employer (or my employer has intellectual property rights in my Submissions by contract or applicable law). I have permission from my employer to make Submissions and enter into this Agreement on behalf of my employer. By signing below, the defined term “You” includes me and my employer.
@microsoft-github-policy-service agree company="Microsoft"Contributor License Agreement
@microsoft-github-policy-service agree
@ultmaster
please revert unrelated changes and summarize the core changes needed.
please revert unrelated changes and summarize the core changes needed.
I have reverted unrelated changes like .gitignore. I've also reverted uv.lock as you requested. Please take a look at the PR description. The summary of core changes are included.
/ci
🚀 CI Watcher for correlation id-3605122076-mipjz6ty triggered by comment 3605122076 🏃♀️ Tracking 1 workflow run(s):
- 🔴 Spider - PR #350 - ci-spider - id-3605122076-mipjz6ty —
completed/failure
✅ All runs completed.
Would you fix the failed tests?
Would you fix the failed tests?
This was caused by the outdated uv.lock file, , which was reverted previously as you requested. The current lockfile does not incorporate the necessary dependency resolutions (NumPy 2.0 compatibility). If you wish so, I can update the uv.lock file again (which will result in over 2,000 lines of change in uv.lock that you saw earlier)
/ci
🚀 CI Watcher for correlation id-3606124420-mipv1cwn triggered by comment 3606124420 🏃♀️ Tracking 1 workflow run(s):
- 🔴 Spider - PR #350 - ci-spider - id-3606124420-mipv1cwn —
completed/failure
✅ All runs completed.
ValueError: No suitable cached response found. Please ensure the prompt caches are populated. ERROR: Exception in ASGI application
Failure was caused by the lack of corresponding cache, which really can't be fixed by myself.
ValueError: No suitable cached response found. Please ensure the prompt caches are populated. ERROR: Exception in ASGI application
Failure was caused by the lack of corresponding cache, which really can't be fixed by myself.
I think you can fix it. You just need to set the global variable (OPENAI_BASE_URL) in different modes.
But let's pause here. I'll take over the PR and fix the rest of the issues.