agent-lightning icon indicating copy to clipboard operation
agent-lightning copied to clipboard

build(deps): upgrade to LangChain 1.x and resolve transitive NumPy 2.0+ conflicts

Open Acture opened this issue 1 month ago • 12 comments

🚀 Summary

This PR modernizes the project's dependency graph to enable full integration with the LangChain 1.x ecosystem. It addresses critical "dependency hell" issues caused by upstream transitive dependencies (specifically verl and TransferQueue) incompatible with NumPy 2.0+.

Additionally, this PR refactors the internal agent architecture to align with updated LangChain APIs, replacing deprecated usage with modern patterns (LangSmith, LangGraph).


🛠️ Dependency Resolution Strategy

The Problem: Conflicting Requirements

The environment upgrade was blocked by strict version pinning in upstream dependencies:

Component Requirement Status
LangChain 1.x numpy >= 2.0.0 Target
TransferQueue (via verl) numpy < 2.0.0 Conflict

The Solution: Source Overrides & Modernization

Since we cannot wait for upstream releases, I have applied a temporary Source Override strategy to unblock the build.

  1. Upstream Modernization: Forked TransferQueue and verl, migrating their build systems to pyproject.toml (PEP 621).
  2. NumPy 2.0 Compatibility: Relaxed numpy constraints in the TransferQueue fork after verifying API usage.
  3. Conflict Elimination: Removed the problematic sglang optional dependency group from the verl fork.

Implementation Details

The pyproject.toml has been updated to use [tool.uv.sources] pointing to the patched forks:

# Changes in agent-lightning/pyproject.toml
[tool.uv.sources]
transferqueue = { git = "[https://github.com/Acture/TransferQueue.git](https://github.com/Acture/TransferQueue.git)"}
verl = { git = "[https://github.com/Acture/verl.git](https://github.com/Acture/verl.git)"}

Safety Check: analysis confirms TransferQueue uses standard NumPy APIs. It does not rely on C-API structures changed in NumPy 2.0, ensuring ABI compatibility.

⚡ Codebase Modernization (LangChain 1.x)

Apart from dependency fixes, the following architectural changes were made to support LangChain 1.x:

LangSmith Integration: Replaced deprecated langchain hub usage with langsmith.Client.

Agent Construction: Switched from the legacy create_react_agent to the more flexible create_agent pattern; updated prompt handling accordingly, alsoupdated calculator tool usage.

⚠️ Known CI/CD Limitations

Expect the regression test suite to fail.

Since this PR updates core LangChain prompt templates (e.g., switching to langsmith hub patterns), the request signatures have changed. The existing mock server cache is now invalidated, resulting in the following error during integration tests:

ValueError: No suitable cached response found. Please ensure the prompt caches are populated.

Root Cause: The Mock/Caching layer cannot find a matching response key for the modernized prompt structure. Resolution: This is not a code error. To resolve this, the test suite must be run in record mode against a live LLM endpoint to regenerate the cache. (Blocked by missing credentials in the local/current CI environment).

Acture avatar Dec 02 '25 09:12 Acture

@Acture please read the following Contributor License Agreement(CLA). If you agree with the CLA, please reply with the following information.

@microsoft-github-policy-service agree [company="{your company}"]

Options:

  • (default - no company specified) I have sole ownership of intellectual property rights to my Submissions and I am not making Submissions in the course of work for my employer.
@microsoft-github-policy-service agree
  • (when company given) I am making Submissions in the course of work for my employer (or my employer has intellectual property rights in my Submissions by contract or applicable law). I have permission from my employer to make Submissions and enter into this Agreement on behalf of my employer. By signing below, the defined term “You” includes me and my employer.
@microsoft-github-policy-service agree company="Microsoft"

Contributor License Agreement

@microsoft-github-policy-service agree

Acture avatar Dec 02 '25 09:12 Acture

@ultmaster

Acture avatar Dec 02 '25 09:12 Acture

please revert unrelated changes and summarize the core changes needed.

ultmaster avatar Dec 02 '25 09:12 ultmaster

please revert unrelated changes and summarize the core changes needed.

I have reverted unrelated changes like .gitignore. I've also reverted uv.lock as you requested. Please take a look at the PR description. The summary of core changes are included.

Acture avatar Dec 02 '25 09:12 Acture

/ci

ultmaster avatar Dec 03 '25 05:12 ultmaster

🚀 CI Watcher for correlation id-3605122076-mipjz6ty triggered by comment 3605122076 🏃‍♀️ Tracking 1 workflow run(s):

✅ All runs completed.

github-actions[bot] avatar Dec 03 '25 05:12 github-actions[bot]

Would you fix the failed tests?

ultmaster avatar Dec 03 '25 07:12 ultmaster

Would you fix the failed tests?

This was caused by the outdated uv.lock file, , which was reverted previously as you requested. The current lockfile does not incorporate the necessary dependency resolutions (NumPy 2.0 compatibility). If you wish so, I can update the uv.lock file again (which will result in over 2,000 lines of change in uv.lock that you saw earlier)

Acture avatar Dec 03 '25 07:12 Acture

/ci

ultmaster avatar Dec 03 '25 10:12 ultmaster

🚀 CI Watcher for correlation id-3606124420-mipv1cwn triggered by comment 3606124420 🏃‍♀️ Tracking 1 workflow run(s):

✅ All runs completed.

github-actions[bot] avatar Dec 03 '25 10:12 github-actions[bot]

ValueError: No suitable cached response found. Please ensure the prompt caches are populated. ERROR: Exception in ASGI application

Failure was caused by the lack of corresponding cache, which really can't be fixed by myself.

Acture avatar Dec 03 '25 11:12 Acture

ValueError: No suitable cached response found. Please ensure the prompt caches are populated. ERROR: Exception in ASGI application

Failure was caused by the lack of corresponding cache, which really can't be fixed by myself.

I think you can fix it. You just need to set the global variable (OPENAI_BASE_URL) in different modes.

But let's pause here. I'll take over the PR and fix the rest of the issues.

ultmaster avatar Dec 03 '25 14:12 ultmaster