Michael Feil
                                            Michael Feil
                                        
                                    @casper-hansen open for collaboration, but no further progress unfortunately.
Interesting - torch.compile does not seem to work then. you might need gcc as a c++ installed. Best with e.g. build-essential? Otherwise: Could you provide some longer logs - seems...
Oh, I missed the segfault at the end of the script. What GPU is this on? Is the same happening via dockerfile (cuda12.1)? Have you used other models with torch.compile?
completed `--lengths-via-tokenize`
Okay, if the docker image runs, I can provide no further assistance - its to hard to debug, I would guess some c++ extension might be incompatible. Please install all...
Whats the advantage of your pip freeze - is this more helpful than poetry lock? https://github.com/michaelfeil/infinity/blob/main/libs/infinity_emb/poetry.lock
Closing for stale.
Good idea, I assume as the payload is stringified and sent as payload. On the other hand, json encoding took around 20% of the CPU, in some cases was responsible...
I slightly optimized queueing - I don't think the decimals in the json would significantly influence the throughput.