mypipe Snapshotting a large table

Hi,

I'm trying to snapshot a large table (~100 million rows) to kafka to bootstrap a replica of a mysql table on HDFS. I'm using the --no-transaction flag because I don't have FLUSH permissions on the database. First, I had to extend the timeout in the handleEvent method. Now, I'm running into the following garbage collection error:

Exception in thread "metrics-meter-tick-thread-1" java.lang.OutOfMemoryError: GC overhead limit exceeded
Exception in thread "metrics-meter-tick-thread-3" java.lang.OutOfMemoryError: GC overhead limit exceeded
Exception in thread "metrics-meter-tick-thread-4" Exception in thread "shutdownHook1" java.lang.OutOfMemoryError: GC overhead limit exceeded

From what I can tell, it appears the entire table snapshot is contained within a single SelectEvent. This error occurs after a few minutes during the SelectConsumer.handleEvents() loop. Do you have any recommendations on how to get around the garbage collection issue? Thanks for all your work on this project!

Mar 02 '16 15:03 mbittmann

@mbittmann thanks for the feedback.

The current implementation is very naive in terms of handling large tables. I've been looking around similar projects to see how they handle this, and I like the way that Sqoop can split a table into multiple parts based on a split-by column. I'm going to implement similar functionality for mypipe soon unless someone else gets to it first (=

In the mean time, you can try giving the JVM more memory and see if that helps, although this is really a terrible and very temporary solution at best.

Mar 13 '16 15:03 mardambey

Thanks for the reply! That makes sense. I ended up going with sqoop to bootstrap the tables, which also has the advantage of bypassing Kafka. There were a few serialization issues to tackle with sql column types being mapped to different avro types, such as with timestamps and certain flavors of tinyint.

Mar 13 '16 16:03 mbittmann

Starting to make progress here, @mbittmann. See commit ~~63d1f43e0f1d025d511052e27a7e5b03e165a3bc~~ 6aff568244026bea87e438f526dd3969a9a81536.

Jun 26 '16 19:06 mardambey

mypipe mypipe copied to clipboard

Snapshotting a large table

mypipe
mypipe copied to clipboard