Computer hardware for high-channel (1000+) datasets
Hello, I'm not sure if this is the right place to post this, but I'm looking into purchasing a PC that will be used for both recording and spike sorting from 1000+ channel data via ~eight 128-channel Diagnostic Biochips Deep Array linear probes recording simultaneously. I have reviewed the hardware recommendations page in the readthedocs, but I'm unsure exactly how that scales up to this many channels.
The main specs I'm focused on right now are the GPU and RAM. What sort of GPU would be required for this? Would 24GB of GRAM be sufficient (eg NVIDIA 4090)? As for RAM, does this scale up linearly with channel count, or is it mainly just recording length that matters? Our recording sessions would likely not be more than 3-4 hours. Would 64GB be sufficient? And is there anything to gain from ECC RAM or higher speed RAM?
As for the rest, I'm thinking of an 8-core Intel Xeon Processor (unless the i9 series is just as good), a fast and large NVMe SSD, and a large HDD for intermediate-term storage. Our hope is to be able to record a session and then set up Kilosort to run overnight and be finished by the morning.
24 GB of video RAM should be sufficient. System RAM mainly depends on spike count. 64GB should be enough for a 3-4 hour recording even with the high channel count, but if you're expecting a lot of units / high firing rates then you may need more (like 128 GB). ECC ram isn't necessary. Higher speed ram should improve sorting time, but I can't give you a good estimate on how big of an impact that will make.
The other hardware you mentioned is all fine. Most of KS4's heavy lifting is done on GPU, so the CPU isn't too important. An i9 series processor would be plenty fast enough.
@ajp221 if the probes are independent (sounds like they are) it could be a good idea to run them through Kilosort 128 channels at a time in sequence, rather than as a big 1024-channel recording. Would dramatically reduce your hardware requirements.
The 40xx GPU cards are basically obsolete and replaced by 50xx. I would stay away from xx90 because of huge power requirement. Going forward the Nvidia RTX PRO Blackwell cards will likley be the best as they are smaller, lower power, high density VRAM, They are very new and just this week becoming available; 4000 24GB and 4500 32GB models.
I would not say the 40xx GPUs are obsolete. The performance bump to 50xx is very minor, especially in Kilosort. If you are building a new computer around it, make sure the power requirements are satisfied, but that is not it itself a big reason to skip the 4090. If you get a good deal on a 4090, it will be substantially faster than RTX PRO 4000 /4500 (especially vs 4000) for significantly less money.
Is vRAM the sole relevant spec in a GPU for kilosort, or do things like memory bandwidth, cores, TFLOPS matter too? For example, here are some options within the ~$1000 - $2500 price range. Would any of these options be a particularly good fit for us?
| vRAM | Memory Bandwidth | CUDA Cores/Tensor Cores | FP16 | Price | |
|---|---|---|---|---|---|
| RTX 4090 | 24 GB GDDR6X | 1.01 TB/s | 16384/512 | 82.58 TFLOPS | $1600 |
| RTX 5080 | 16 GB GDDR7 | 960 GB/s | 10752/336 | 56.28 TFLOPS | $1000 |
| RTX Pro 4000 Blackwell | 24 GB GDDR7 | 672 GB/s | 8960/280 | 36.83 TFLOPS | $1500 |
| RTX 4000 Ada | 20 GB GDDR6 | 360 GB/s | 6144/192 | 26.73 TFLOPS | $1500 |
| RTX 5090 | 32GB GDDR7 | 1.79 TB/s | 21760/680 | 104.8 TFLOPS | $2500+ |
Are any of these particularly overkill or underkill? Our main goal is to just be able to run one ~1024ch 3-4hr session's worth of data overnight and have it finished by the morning. So as long as we could do this in <12 hours, we would be fine. Our system RAM would likely be 64 or 128GB of DDR5 4800MHz.
That is true, Marius. Should have chosen my words more carefully. Haven’t evaluated xx90 cards as they won’t fit in my Dell case. They are also power hungry and run hot in my short experience. These are the cards that were melting power connectors. The Blackwells are smaller, narrower, lower power, and I like the single blower fan instead of 3 or 4 side fans. 4090s are presently around $3K which is a good deal. 4500 Blackwell is $2500. Anyway, I’ll report some testing numbers. Harvey
From: Marius Pachitariu @.> Sent: Thursday, December 11, 2025 2:27 PM To: MouseLand/Kilosort @.> Cc: Harvey Wiggins @.>; Comment @.> Subject: Re: [MouseLand/Kilosort] Computer hardware for high-channel (1000+) datasets (Issue #1011)
[https://avatars.githubusercontent.com/u/3935246?s=20&v=4]marius10p left a comment (MouseLand/Kilosort#1011)https://github.com/MouseLand/Kilosort/issues/1011#issuecomment-3643659621
I would not say the 40xx GPUs are obsolete. The performance bump to 50xx is very minor, especially in Kilosort. If you are building a new computer around it, make sure the power requirements are satisfied, but that is not it itself a big reason to skip the 4090. If you get a good deal on a 4090, it will be substantially faster than RTX PRO 4000 /4500 (especially vs 4000) for significantly less money.
— Reply to this email directly, view it on GitHubhttps://github.com/MouseLand/Kilosort/issues/1011#issuecomment-3643659621, or unsubscribehttps://github.com/notifications/unsubscribe-auth/BGTXPZWXGWQFHBVRS5DSNTD4BHHTDAVCNFSM6AAAAACOVRUUB2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZTMNBTGY2TSNRSGE. You are receiving this because you commented.Message ID: @.***>
@ajp221 I am a little hesitant to recommend one over the other because I don't know how much VRAM you will really need. Generally trying to maximize VRAM size is not a bad idea for future proofing, so I'd suggest at least the 4090 or the RTX PRO 4000. The latter will be ~2x slower in Kilosort but take 1/3 of the power. Either one should finish processing your recordings over night. I am personally not too worried about the higher power requirements of the 4090, we've been running xx90 series cards with no issues so far.
@HWWiggins the rtx pro 6000 is a 600W card with the same connector, and those run fine in many compute environments. There are power supplies that have temperature sensors on their 12vhpwr cables and that's what we use.
@ajp221 if the probes are independent (sounds like they are) it could be a good idea to run them through Kilosort 128 channels at a time in sequence, rather than as a big 1024-channel recording. Would dramatically reduce your hardware requirements.
I've realized what Marius said here is the correct method that we'll be using (thanks for pointing that out). We'll just be sorting one 128-channel probe at a time, and run all 8-10 probes in sequence overnight. That said, would you say that something along the lines of an RTX 5070 Ti, 5080, or 5060 Ti would be sufficient? (All 3 are 16GB cards)
Just FYI, report of KS4 sorting performance on a high-density probe with 3 different GPU cards
Computer: Dell 12-core Xeon 5860 Workstation w 32GB ram, probe file resident on 4TB NVME M.2 Probe: 8-shank 1024 pixel SiNAPS HD probe, true 1024 chans so not a 384 subset. Recording: 1 hour file recorded at 40 KHz into PL2 format, 289 GB Processing: times are KS4 only and include high-pass filter, but do not include preprocessing, creation of BIN file for KS4 input, or file copying. Default KS4 params.
Gigabyte 5070Ti/16GB VRAM errored out with insufficient VRAM. A 30 min file completed after 3.3 hours. 4090 has 24GB VRAM and 5090 has 32 GB VRAM so both of those would handle 1 hour and most likely be the shortest run time as Marius points out. Don’t have a 5090 to try. Will experiment with KS4 capability to process shanks separately at some point. RTX Pro 4500 Blackwell/32GB VRAM, 2-slot wide, completed in 2.5 hours RTX Pro 4000 Blackwell/24GB VRAM, 1-slot wide, completed in 3.2 hours Will try a 2-hour file with the 4000 to determine if 24GB is sufficient.
The sorting problem is going to get worse as eventually there will be 2048 pixel probes and multiple shanks.
From these numbers you can approximate roughly how other configurations would perform, realizing the spike rate has an effect as well.
I expect for your case that 16GB is sufficient with 5070 Ti or 5080.
HW
From: ajp221 @.> Sent: Tuesday, December 16, 2025 3:40 PM To: MouseLand/Kilosort @.> Cc: Harvey Wiggins @.>; Mention @.> Subject: Re: [MouseLand/Kilosort] Computer hardware for high-channel (1000+) datasets (Issue #1011)
[https://avatars.githubusercontent.com/u/192832974?s=20&v=4]ajp221 left a comment (MouseLand/Kilosort#1011)https://github.com/MouseLand/Kilosort/issues/1011#issuecomment-3662491672
@ajp221https://github.com/ajp221 if the probes are independent (sounds like they are) it could be a good idea to run them through Kilosort 128 channels at a time in sequence, rather than as a big 1024-channel recording. Would dramatically reduce your hardware requirements.
I've realized what Marius said here is the correct method that we'll be using (thanks for pointing that out). We'll just be sorting one 128-channel probe at a time, and run all 8-10 probes in sequence overnight. That said, would you say that something along the lines of an RTX 5070 Ti, 5080, or 5060 Ti would be sufficient? (All 3 are 16GB cards)
— Reply to this email directly, view it on GitHubhttps://github.com/MouseLand/Kilosort/issues/1011#issuecomment-3662491672, or unsubscribehttps://github.com/notifications/unsubscribe-auth/BGTXPZS223OYDGEZFE5BK7D4CB3ZXAVCNFSM6AAAAACOVRUUB2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZTMNRSGQ4TCNRXGI. You are receiving this because you were mentioned.Message ID: @.***>
Agreed, any 50 series 16GB card should be sufficient for overnight runs. I expect about a factor of two speed difference between 5060ti and 5080. 5070ti is pretty close to 5080 and generally much more affordable.