scep
scep copied to clipboard
FileDepot improvements
This SCEP server is able to perform very well right out of the box. However it has one serious flaw. FileDepot isn't very performant.
Doing quick test reveals some serious issues:
Run below command against scep-server (I was running mine in docker)
seq 10 | parallel './scepclient-darwin-arm64 -private-key ./key.key -server-url=http://localhost:9000/scep -challenge=abc' | \grep "badRequest" -c
I ran this test 10 times, and results were varying between 3 and 6 occurrences of badRequest
.
I did receive responses very quickly, what is performance in this case, if there is an error rate of ~50%?
error line from server:
ts=2022-02-28T10:25:33.055121419Z caller=service.go:88 msg="failed to sign CSR" err="open depot/scepclient.724.pem: file exists"
I'm not sure how your command worked at all past the first iteration, as scepservers throws a bad request at me if the same DN is requested twice, even not in parallel.
I can't replicate this on main using:
seq 10 | parallel 'mkdir -p test{#} && cd test{#} && ../scepclient-linux-amd64 -private-key ./key.key -organization client{#} -server-url=http://localhost:2016/scep -challenge=secret'
Granted, I'm on an NVMe drive, so it might be data races on slower disks causing this. I think that depot/file/depot.go:SignCSR
should be wrapped in a sync.Mutex to avoid data races with the filesystem. In some initial testing, it was still able to hold up to quite a few requests per second.
I guess command worked, because I run scep-server with -allowrenew=0
.
I also run it on local docker and in AWS EKS, so that drive factor may play a role.
What's interesting, is that, after running that command several times serial
file seems to be stuck at 02
, despite I already have over 700 pem files on drive.
index.txt
contains correct data, contains all records for all pem files, but serial
is way of.
As a result, I can no longer generate any cert, every single request fails with ts=2022-03-01T08:51:00.788576137Z caller=service.go:88 msg="failed to sign CSR" err="open depot/scepclient.2.pem: file exists"
I'll just add that I restarted container with scep-server in the meantime.
I did a PR to add a mutex for the depot to fix this issue.
Thanks @korylprince !
Your fix helped, but did not solve the issue completely. I tested your fix, and now I'm getting 1-3 failure when run seq 15 | parallel...
.
My steps:
- checkout branch with fix
- run
make release
- update Dockerfile from repo to use
scepclient-darwin-arm64
andscepserver-darwin-arm64
(because I'm on MacBook running M1) - build Docker image using
docker build . -t scep-test:1
- run docker image with following docker-compose using
docker-compose -f docker-compose.yml up scep-server
wheredocker-compose.yaml
looks like this
version: '2'
services:
scep-server:
build:
context: .
dockerfile: Dockerfile-scep
ports:
- 9000:8080
entrypoint: ['/bin/sh', '-c', "/usr/bin/scepserver -challenge=asd -capass=qwe -allowrenew=0"]
and Dockerfile-scep
looks like this
FROM scep-test:1
RUN rm -fr /usr/bin/depot
RUN /usr/bin/scepserver ca -init -organization test.org -organizational_unit testUnit -keySize=1028 -key-password=qwe
- run
seq 10 | parallel './scepclient-darwin-arm64 -organization client{#} -private-key ./key.key -server-url=http://localhost:9000/scep -challenge=asd'
With above setup I'm still getting ts=2022-03-02T11:15:56.90954072Z caller=service.go:88 msg="failed to sign CSR" err="open depot/scepclient.54.pem: file exists"
the difference now is that counter in serial
file is off a bit.
I also started noticing such log ts=2022-03-02T11:15:56.889752595Z caller=service.go:88 msg="failed to sign CSR" err="open depot/serial: no such file or directory"
and
ts=2022-03-02T11:05:50.299688759Z caller=service.go:88 msg="failed to sign CSR" err="open depot/serial: file exists"
I cannot replicate this issue with my branch. I even simulated very slow disk speeds and ran 100 concurrent requests, and everything worked correctly.
Also note that the serial is stored in hex. So after running 100 concurrent requests, serial has the value 65 = 101 in decimal, since client certificates start at 2.
Ok, so I tested your fix in target environment (AWS EKS) instead of locally, and it seems to work just fine.
@korylprince your PR was merged, this issue can be closed.
@groob Any chance for official release?