gokogiri icon indicating copy to clipboard operation
gokogiri copied to clipboard

Parsing XML dies (stays blocked) when doing in parallel

Open krezac opened this issue 9 years ago • 2 comments

Hello all, I've discovered this while load-testing my app on Windows 7. When there are multiple goroutines doing XML parsing, sooner or later all of them stay stucked in xml.Parse() and the CPU load drops to zero.

It's much more probable for go run than for building and running the executable. So far I wasn't able to repro it on linux.

See the attached sample code (no request, just 10 go routines running in loop). It should timeout after 20 seconds (or 10000 iterations) but with go run it usually ends up like

Id 8, iter: 571, elapsed: 5.018003 Id 8, iter: 572, elapsed: 5.021004 Id 8, iter: 573, elapsed: 5.023504 Id 8, iter: 574, elapsed: 5.026005 Id 8, iter: 575, elapsed: 5.029006 Done -- this is outputted after 20s - timeout expired

Any idea what can be wrong? Thanks in advance gokogiri-load.zip

krezac avatar Jan 07 '16 15:01 krezac

The underlying library (libxml2) does not guarantee thread safety for multiple threads sharing the same document. It can be done but requires the caller (gokogiri in this case) to handle any locking and synchronization. I'm not surprised it has issues under load.

You should be able to parse different documents in each goroutine safely, though I don't know how feasible that is for your actual application.

jbowtie avatar May 21 '16 06:05 jbowtie

Thanks for your reply. I was just curious whether it's common problem (we eventually handled the issue different way).

On Sat, May 21, 2016 at 8:56 AM, John C Barstow [email protected] wrote:

The underlying library (libxml2) does not guarantee thread safety for multiple threads sharing the same document. It can be done but requires the caller (gokogiri in this case) to handle any locking and synchronization. I'm not surprised it has issues under load.

You should be able to parse different documents in each goroutine safely, though I don't know how feasible that is for your actual application.

— You are receiving this because you authored the thread. Reply to this email directly or view it on GitHub https://github.com/moovweb/gokogiri/issues/90#issuecomment-220762671

krezac avatar May 22 '16 19:05 krezac