matrixone
matrixone copied to clipboard
[Performance]: spark sql java application insert speed low.
Is there an existing issue for performance?
- [X] I have checked the existing issues.
Environment
- Version or commit-id (e.g. v0.1.0 or 8b23a93): v0.5.0
- Hardware parameters: 20G memory
- OS type: ubuntu 20.04
- Others: use docker run to start MO.
Details of Performance
spark sql jar application. standardalone, single node to execute spark java application, use dataset.write() function:
Dataset<Row> testRandom = spark.sql("select fid, abs(int(rand()*1000)) as id from test limit 100000"); //spark sql testRandom.write() .mode(SaveMode.Append) .option("batchsize", "10000") .option("isolationLevel", "NONE") .jdbc(dbConnectionUrl, "test.test1", props);
insert data includes 2 column, type is int, insert row count is 100,000, random number like fid id 0 6008889 1 4988098
Insert speed benchmark:
spark to clickhouse:
2493 ms
spark to mo: 853625 ms
spark to mysql: 542404 ms
Additional information
For example: Have you compared MatrixOne with other databases? If yes, what's their difference?
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.12</artifactId>
<version>3.1.1</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-sql_2.12</artifactId>
<version>3.1.1</version>
</dependency>
<dependency>
<groupId>mysql</groupId>
<artifactId>mysql-connector-java</artifactId>
<version>8.0.30</version>
</dependency>
<dependency>
<groupId>com.clickhouse</groupId>
<artifactId>clickhouse-jdbc</artifactId>
<version>0.3.2-patch11</version>
<!-- use uber jar with all dependencies included, change classifier to http for smaller jar -->
<classifier>all</classifier>
<exclusions>
<exclusion>
<groupId>*</groupId>
<artifactId>*</artifactId>
</exclusion>
</exclusions>
</dependency>
btw, I found "isolationLevel" can not be set to other values. Please check the spark website for more info, https://spark.apache.org/docs/3.1.1/sql-data-sources-jdbc.html
isolationLevel: The transaction isolation level, which applies to current connection. It can be one of NONE, READ_COMMITTED, READ_UNCOMMITTED, REPEATABLE_READ, or SERIALIZABLE, corresponding to standard transaction isolation levels defined by JDBC's Connection object, with default of READ_UNCOMMITTED. This option applies only to writing. Please refer the documentation in java.sql.Connection.
@yangtian9999 I think you can reassign it to CN guys.
First can we check if 1. prepared, 2. one big transaction instead of many small ones. If all true, reassign to CN S3 write team