mlx-swift-examples icon indicating copy to clipboard operation
mlx-swift-examples copied to clipboard

llm-tool: crash during async generation

Open zach-brockway opened this issue 6 months ago • 10 comments

Steps to reproduce:

  1. Move LLMTool.swift @main annotation from SyncGenerator to AsyncGenerator.
  2. Run llm-tool target.

Result: Thread 4: EXC_BAD_ACCESS (code=2, address=0x16ff1bfe0)

Call stack:

#0	0x00000001050d4ac0 in mlx::core::eval(std::__1::vector<mlx::core::array, std::__1::allocator<mlx::core::array>> const&)::$_3::operator()(mlx::core::array const&, bool) const at /Users/zach/Library/Developer/Xcode/DerivedData/mlx-swift-examples-ffvauqjflgpefffxjxtewhqyjymm/SourcePackages/checkouts/mlx-swift/Source/Cmlx/mlx/mlx/transforms.cpp:55
#1	0x00000001050d4ab0 in decltype(std::declval<mlx::core::eval(std::__1::vector<mlx::core::array, std::__1::allocator<mlx::core::array>> const&)::$_3&>()(std::declval<mlx::core::array const&>(), std::declval<bool>())) std::__1::__invoke[abi:v160006]<mlx::core::eval(std::__1::vector<mlx::core::array, std::__1::allocator<mlx::core::array>> const&)::$_3&, mlx::core::array const&, bool>(mlx::core::eval(std::__1::vector<mlx::core::array, std::__1::allocator<mlx::core::array>> const&)::$_3&, mlx::core::array const&, bool&&) at /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX14.2.sdk/usr/include/c++/v1/__functional/invoke.h:394
#2	0x00000001050d4a50 in void std::__1::__invoke_void_return_wrapper<void, true>::__call<mlx::core::eval(std::__1::vector<mlx::core::array, std::__1::allocator<mlx::core::array>> const&)::$_3&, mlx::core::array const&, bool>(mlx::core::eval(std::__1::vector<mlx::core::array, std::__1::allocator<mlx::core::array>> const&)::$_3&, mlx::core::array const&, bool&&) at /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX14.2.sdk/usr/include/c++/v1/__functional/invoke.h:487

(... snip 5000 frames of mlx::core::eval recursion...)

#5320	0x00000001050c8d9c in mlx::core::eval(std::__1::vector<mlx::core::array, std::__1::allocator<mlx::core::array>> const&) at /Users/zach/Library/Developer/Xcode/DerivedData/mlx-swift-examples-ffvauqjflgpefffxjxtewhqyjymm/SourcePackages/checkouts/mlx-swift/Source/Cmlx/mlx/mlx/transforms.cpp:119
#5321	0x0000000104537828 in ::mlx_eval(mlx_vector_array) at /Users/zach/Library/Developer/Xcode/DerivedData/mlx-swift-examples-ffvauqjflgpefffxjxtewhqyjymm/SourcePackages/checkouts/mlx-swift/Source/Cmlx/include/mlx/c/transforms.cpp:20
#5322	0x0000000105199c80 in eval(_:) at /Users/zach/Library/Developer/Xcode/DerivedData/mlx-swift-examples-ffvauqjflgpefffxjxtewhqyjymm/SourcePackages/checkouts/mlx-swift/Source/MLX/Transforms+Eval.swift:12
#5323	0x0000000104494928 in closure #1 in generate(prompt:model:temp:) at /Users/zach/Documents/Development/GitHub/mlx/mlx-swift-examples/Libraries/LLM/Util.swift:103
#5324	0x0000000104494f14 in partial apply for closure #1 in generate(prompt:model:temp:) ()
#5325	0x0000000104495910 in thunk for @escaping @callee_guaranteed @Sendable @async () -> (@out τ_0_0) ()
#5326	0x0000000104495a6c in partial apply for thunk for @escaping @callee_guaranteed @Sendable @async () -> (@out τ_0_0) ()

zach-brockway avatar Feb 26 '24 01:02 zach-brockway

This gets really deep call stacks when compiled Debug. These are mostly tail calls and if you compile Release this will work. I will document this.

davidkoski avatar Feb 26 '24 02:02 davidkoski

Thanks for the quick reply! I wondered if that was what was happening, but after trying a couple of things, I got the impression the stack size can't really be controlled with Swift concurrency (unless/until more executor stuff lands, maybe?). It wasn't immediately clear to me how to e.g. link an optimized build of the dependency into an otherwise debug build of my app, either. I don't have a ton of Swift experience, so grateful for any suggestions!

zach-brockway avatar Feb 26 '24 05:02 zach-brockway

Yeah, I don't know that the stack size can be controlled either. Another option (maybe) is to put @MainActor on it. This would work fine for a command line tool like this, but wouldn't work for an application, e.g. if you wanted to run this on iOS. This might work for you, depending on what you are doing.

As for how to do that when you have a Debug build, one thing that I tried much earlier was to add -O3 in mlx-swift's Package.swift.

For example around line 87:

            cxxSettings: [
                .headerSearchPath("mlx"),
                .headerSearchPath("include/mlx-c"),

You can add something like .unsafeOptions(["-O3"]),. This isn't suitable for putting in the Package.swift in the repo as it makes it unbuildable in most situations (limitations around the unsafe options) but I think it ought to work for development.

davidkoski avatar Feb 26 '24 06:02 davidkoski

Makes sense. Yeah, unfortunately, I was trying to wrap it with SwiftUI. I'll poke around at it more this evening after work. Thanks again!

zach-brockway avatar Feb 26 '24 16:02 zach-brockway

I added some notes in the README: 8870b0d3868a4188b5b160c374ecfb12fc0a18bc

davidkoski avatar Feb 26 '24 18:02 davidkoski

Thanks. Sadly, I tried .unsafeFlags(["-O3"]) in both cSettings and cxxSettings for the Cmlx target without any luck.

I ended up kludging together a variation of generate() that's a cheesy mishmash of NSThread + async; I'm not even sure this semaphore backpressure hack will actually work the way I intended, but FWIW:

public class LLMGeneratorThread : Thread {
	public let buffer: AsyncBufferSequence<AsyncChannel<Int>>

	let channel = AsyncChannel<Int>()
	let prompt: MLXArray
	let model: LLMModel
	let sem = DispatchSemaphore(value: 0)

	public init(prompt: MLXArray, model: LLMModel) {
		self.prompt = prompt
		self.model = model
		self.buffer = channel.buffer(policy: .bounded(10))

		super.init()

		self.stackSize = 8 * 1024 * 1024
	}

	public override func main() {
		var y = prompt
		var cache = [(MLXArray, MLXArray)]()
		while !isCancelled {
			var logits: MLXArray
			(logits, cache) = model(
				expandedDimensions(y, axis: 0), cache: cache.isEmpty ? nil : cache)
			y = sample(logits: logits[-1, axis: 1], temp: 1.0)
			eval(y)

			let token = y.item(Int.self)
			Task {
				await channel.send(token)
				sem.signal()
			}
			sem.wait()
		}
	}
}

zach-brockway avatar Feb 27 '24 06:02 zach-brockway

Yeah, mixing the two like this is not ideal. I wonder if you could use this code for #if DEBUG and have an #else (Release) that was straight async code for Release builds? Also not ideal, but I think if you encapsulate then at least it is one particular spot to watch.

davidkoski avatar Feb 27 '24 16:02 davidkoski

Yeah, I don't know that the stack size can be controlled either. Another option (maybe) is to put @MainActor on it. This would work fine for a command line tool like this, but wouldn't work for an application, e.g. if you wanted to run this on iOS. This might work for you, depending on what you are doing.

As for how to do that when you have a Debug build, one thing that I tried much earlier was to add -O3 in mlx-swift's Package.swift.

For example around line 87:

            cxxSettings: [
                .headerSearchPath("mlx"),
                .headerSearchPath("include/mlx-c"),

You can add something like .unsafeOptions(["-O3"]),. This isn't suitable for putting in the Package.swift in the repo as it makes it unbuildable in most situations (limitations around the unsafe options) but I think it ought to work for development.

image

Maybe unsafeFlags?

maiqingqiang avatar Mar 03 '24 11:03 maiqingqiang

Maybe unsafeFlags?

Yeah, that's the ticket :-)

davidkoski avatar Mar 05 '24 19:03 davidkoski

@davidkoski @zach-brockway I have tried all optimization flags, only -O3 does not work, -O, -O1, -O2 and -Os are all working.😂

image

maiqingqiang avatar Mar 06 '24 16:03 maiqingqiang

The deep recursion issues are fixed as of https://github.com/ml-explore/mlx-swift/pull/67

Eval in an async context is safe. You can even run Debug builds!

davidkoski avatar Apr 26 '24 07:04 davidkoski