go-livepeer icon indicating copy to clipboard operation
go-livepeer copied to clipboard

fix(ai): fix ai model config parsing

Open ad-astra-video opened this issue 1 year ago • 1 comments

What does this pull request do? Explain your changes. (required)

aiModels.json parsing failed to set price for model that was mixed into different pipeline/model configs. See logs below and attached aiModels.json. Note some extra log lines were added for visibility into aiCaps and the autoPrice set from the config.

vires-in-numeris pointed out this bug with the following log lines:

2024/09/17 14:10:30 INFO Starting external container name=segment-anything-2_facebook-sam2-hiera-large_http://localhost:9010 modelID=facebook/sam2-hiera-large
I0917 14:10:30.334105       1 db.go:368] Closing DB
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x30 pc=0x19ff1cd]

goroutine 74 [running]:
math/big.(*Rat).IsInt(...)
        /usr/local/go/src/math/big/rat.go:401
math/big.(*Rat).FloatString(0x0, 0x3)
        /usr/local/go/src/math/big/ratconv.go:333 +0x2d
github.com/livepeer/go-livepeer/cmd/livepeer/starter.StartLivepeer({_, _}, {0xc000c84250, 0xc000c84260, 0xc000c84270, 0xc000c84280, 0xc000c84290, 0xc000c842c0, 0xc000c842a0, 0xc000c84410, ...})
        /src/cmd/livepeer/starter/starter.go:1337 +0xb03e
main.main.func1()
        /src/cmd/livepeer/livepeer.go:97 +0x59
created by main.main in goroutine 1
        /src/cmd/livepeer/livepeer.go:96 +0xbe5

I was able to reproduce the seg fault with the attached aiModels.json.

aiModels.json

livepeer-test-orchestrator-sam2-1  | I0917 23:11:54.262617       1 pricefeedwatcher.go:164] Starting PriceFeed watch loop
livepeer-test-orchestrator-sam2-1  | 2024/09/17 23:11:55 INFO Starting external container name=segment-anything-2_facebook-sam2-hiera-large_https://lychee-arugula-rkydxbawjmlawca1.salad.cloud modelID=facebook/sam2-hiera-large
livepeer-test-orchestrator-sam2-1  | I0917 23:11:55.144057       1 starter.go:1335] +v%!(EXTRA []core.Capability=[32])
livepeer-test-orchestrator-sam2-1  | I0917 23:11:55.144114       1 starter.go:1338] Capability segment-anything-2 (ID: 32) advertised with model constraint facebook/sam2-hiera-large at price 10353998.886 wei per compute unit
livepeer-test-orchestrator-sam2-1  | I0917 23:11:55.144149       1 starter.go:1339] Capability segment-anything-2 (ID: 32) advertised with model constraint facebook/sam2-hiera-large at price 10353998.886 wei per compute unit
livepeer-test-orchestrator-sam2-1  | 2024/09/17 23:11:55 INFO Starting external container name=segment-anything-2_facebook-sam2-hiera-large_https://lychee-arugula-rkydxbawjmlawca1.salad.cloud modelID=facebook/sam2-hiera-large
livepeer-test-orchestrator-sam2-1  | I0917 23:11:55.867284       1 starter.go:1335] +v%!(EXTRA []core.Capability=[32])
livepeer-test-orchestrator-sam2-1  | I0917 23:11:55.867352       1 starter.go:1338] Capability segment-anything-2 (ID: 32) advertised with model constraint facebook/sam2-hiera-large at price 10353998.886 wei per compute unit
livepeer-test-orchestrator-sam2-1  | I0917 23:11:55.867420       1 starter.go:1339] Capability segment-anything-2 (ID: 32) advertised with model constraint facebook/sam2-hiera-large at price 10353998.886 wei per compute unit
livepeer-test-orchestrator-sam2-1  | 2024/09/17 23:11:56 INFO Starting external container name=text-to-image_ByteDance-SDXL-Lightning_https://lychee-arugula-rkydxbawjmlawca1.salad.cloud modelID=ByteDance/SDXL-Lightning
livepeer-test-orchestrator-sam2-1  | I0917 23:11:56.558637       1 starter.go:1335] +v%!(EXTRA []core.Capability=[32 27])
livepeer-test-orchestrator-sam2-1  | I0917 23:11:56.558696       1 starter.go:1338] Capability text-to-image (ID: 27) advertised with model constraint ByteDance/SDXL-Lightning at price 817750.957 wei per compute unit
livepeer-test-orchestrator-sam2-1  | I0917 23:11:56.558730       1 starter.go:1339] Capability text-to-image (ID: 27) advertised with model constraint ByteDance/SDXL-Lightning at price 817750.957 wei per compute unit
livepeer-test-orchestrator-sam2-1  | 2024/09/17 23:11:57 INFO Starting external container name=segment-anything-2_facebook-sam2-hiera-large_https://lychee-arugula-rkydxbawjmlawca1.salad.cloud modelID=facebook/sam2-hiera-large
livepeer-test-orchestrator-sam2-1  | I0917 23:11:57.315927       1 starter.go:1335] +v%!(EXTRA []core.Capability=[32 27])
livepeer-test-orchestrator-sam2-1  | I0917 23:11:57.315983       1 starter.go:1338] Capability segment-anything-2 (ID: 27) advertised with model constraint facebook/sam2-hiera-large at price 10353998.886 wei per compute unit
livepeer-test-orchestrator-sam2-1  | I0917 23:11:57.316071       1 db.go:368] Closing DB
livepeer-test-orchestrator-sam2-1  | panic: runtime error: invalid memory address or nil pointer dereference
livepeer-test-orchestrator-sam2-1  | [signal SIGSEGV: segmentation violation code=0x1 addr=0x30 pc=0x19ff44d]
livepeer-test-orchestrator-sam2-1  |
livepeer-test-orchestrator-sam2-1  | goroutine 11 [running]:
livepeer-test-orchestrator-sam2-1  | math/big.(*Rat).IsInt(...)
livepeer-test-orchestrator-sam2-1  |    /usr/local/go/src/math/big/rat.go:401
livepeer-test-orchestrator-sam2-1  | math/big.(*Rat).FloatString(0x0, 0x3)
livepeer-test-orchestrator-sam2-1  |    /usr/local/go/src/math/big/ratconv.go:333 +0x2d
livepeer-test-orchestrator-sam2-1  | github.com/livepeer/go-livepeer/cmd/livepeer/starter.StartLivepeer({_, _}, {0xc000051600, 0xc000051610, 0xc000051620, 0xc000051630, 0xc000051640, 0xc000051670, 0xc000051650, 0xc0000517c0, ...})
livepeer-test-orchestrator-sam2-1  |    /src/cmd/livepeer/starter/starter.go:1339 +0xb2de
livepeer-test-orchestrator-sam2-1  | main.main.func1()
livepeer-test-orchestrator-sam2-1  |    /src/cmd/livepeer/livepeer.go:97 +0x59
livepeer-test-orchestrator-sam2-1  | created by main.main in goroutine 1
livepeer-test-orchestrator-sam2-1  |    /src/cmd/livepeer/livepeer.go:96 +0xbe5
livepeer-test-orchestrator-sam2-1 exited with code 2

Specific updates (required)

  • update cmd/livepeer/starter/starter.go to track the current config block capability and get the price for the correct capability/model_id for the config block
  • add a nil check to the GetBasePriceForCap

How did you test each of these updates (required)

Re-built docker container and ran with same aiModels.json. Orchestrator node starts up.

livepeer-test-orchestrator-sam2-1  | I0918 00:00:32.331950       1 pricefeedwatcher.go:164] Starting PriceFeed watch loop
livepeer-test-orchestrator-sam2-1  | 2024/09/18 00:00:33 INFO Starting external container name=segment-anything-2_facebook-sam2-hiera-large_https://lychee-arugula-rkydxbawjmlawca1.salad.cloud modelID=facebook/sam2-hiera-large
livepeer-test-orchestrator-sam2-1  | I0918 00:00:33.286424       1 starter.go:1347] Capability segment-anything-2 (ID: 32) advertised with model constraint facebook/sam2-hiera-large at price 10312544.441 wei per compute unit
livepeer-test-orchestrator-sam2-1  | 2024/09/18 00:00:33 INFO Starting external container name=segment-anything-2_facebook-sam2-hiera-large_https://lychee-arugula-rkydxbawjmlawca1.salad.cloud modelID=facebook/sam2-hiera-large
livepeer-test-orchestrator-sam2-1  | I0918 00:00:33.974178       1 starter.go:1347] Capability segment-anything-2 (ID: 32) advertised with model constraint facebook/sam2-hiera-large at price 10312544.441 wei per compute unit
livepeer-test-orchestrator-sam2-1  | 2024/09/18 00:00:34 INFO Starting external container name=text-to-image_ByteDance-SDXL-Lightning_https://lychee-arugula-rkydxbawjmlawca1.salad.cloud modelID=ByteDance/SDXL-Lightning
livepeer-test-orchestrator-sam2-1  | I0918 00:00:34.641733       1 starter.go:1347] Capability text-to-image (ID: 27) advertised with model constraint ByteDance/SDXL-Lightning at price 814476.917 wei per compute unit
livepeer-test-orchestrator-sam2-1  | 2024/09/18 00:00:35 INFO Starting external container name=segment-anything-2_facebook-sam2-hiera-large_https://lychee-arugula-rkydxbawjmlawca1.salad.cloud modelID=facebook/sam2-hiera-large
livepeer-test-orchestrator-sam2-1  | I0918 00:00:35.380959       1 starter.go:1347] Capability segment-anything-2 (ID: 32) advertised with model constraint facebook/sam2-hiera-large at price 10312544.441 wei per compute unit
livepeer-test-orchestrator-sam2-1  | I0918 00:00:35.381782       1 starter.go:1621] ***Livepeer Running in Orchestrator Mode***
livepeer-test-orchestrator-sam2-1  | I0918 00:00:35.381836       1 starter.go:1631] Livepeer Node version: 0.7.8-ai.2-5b91100d-dirty
livepeer-test-orchestrator-sam2-1  | I0918 00:00:35.381888       1 mediaserver.go:204] Transcode Job Type: [{P240p30fps4x3 600k 30 0 320x240 4:3 0 0 0s 0 0 0 0} {P360p30fps16x9 1200k 30 0 640x360 16:9 0 0 0s 0 0 0 0}]
livepeer-test-orchestrator-sam2-1  | I0918 00:00:35.382027       1 webserver.go:20] CLI server listening on 127.0.0.1:7777
livepeer-test-orchestrator-sam2-1  | I0918 00:00:35.390879       1 cert.go:83] Private key and cert not found. Generating
livepeer-test-orchestrator-sam2-1  | I0918 00:00:35.395048       1 cert.go:22] Generating cert for 127.0.0.1
livepeer-test-orchestrator-sam2-1  | I0918 00:00:35.402253       1 rpc.go:220] Listening for RPC on :8888
livepeer-test-orchestrator-sam2-1  | I0918 00:00:37.383588       1 rpc.go:305] Connecting RPC to uri=https://127.0.0.1:8888
livepeer-test-orchestrator-sam2-1  | I0918 00:00:37.388396       1 rpc.go:258] Received Ping request
livepeer-test-orchestrator-sam2-1  | I0918 00:00:52.194808       1 block_watcher.go:454] Polling blocks from=254618661 to=254618741

Does this pull request close any open issues?

Checklist:

  • [ ] Read the contribution guide
  • [X] make runs successfully
  • [ ] All tests in ./test.sh pass
  • [ ] README and other documentation updated
  • [ ] Pending changelog updated

ad-astra-video avatar Sep 18 '24 00:09 ad-astra-video

@ad-astra-video is this still a problem with the refactors that were included in the AI remote worker?

rickstaa avatar Nov 13 '24 21:11 rickstaa