spark-orientdb icon indicating copy to clipboard operation
spark-orientdb copied to clipboard

How to create Link property and save the data in OrientDB from Spark?

Open LianaN opened this issue 6 years ago • 1 comments

OrientDB Version: 2.2

Scala Version: 2.11.8

I use spark-orientdb connector to store DataFrame to OrientDB.

<dependency>
   <groupId>com.orientechnologies</groupId>
   <artifactId>orientdb-graphdb</artifactId>
   <version>2.2.2</version>
</dependency>
        <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-orientdb-2.2.1_2.11</artifactId>
            <version>1.4</version>
        </dependency>

Everything works fine, except one case. One of the fields of the OrientDB table User should be of the type Link. This field called ContentLink should point to the field ContentId of the table Content.

val cClass = graph.getVertexType("Content")
userA.createProperty("ContentLink", OType.LINK, cClass)

Then I save the data in Spark as follows:

   users
      .write
      .format("org.apache.spark.orientdb.graphs")
      .option("dburl", uri)
      .option("user", username)
      .option("password", password)
      .option("vertextype", "User")
      .mode(SaveMode.Overwrite)
      .save()

When I create properties using createProperty as shown above, the database structure seems to be correct and the field ContentLink is of the type Link. However, when I write the data into the table, this field is changed to String and the link is lost.

This happens because Link should indicate the RecordId (physical address - @rid) of the relevant row in the Content table. I tried to retrieve all @rid of the table Content in Spark, but I was unable to do select @rid because of IndexOutOfBounds error:

    val df_idrid = spark.read
                      .format("org.apache.spark.orientdb.graphs")
                      .option("dburl", uri)
                      .option("user", username)
                      .option("password", password)
                      .option("vertextype", "Content")
                      .option("query", s"select @rid, id from Content")
                      .load()

How can I solve this issue?

LianaN avatar Jun 26 '18 23:06 LianaN

Hi @LianaN , Thanks for opening this issue. Can you look at the following test case https://github.com/orientechnologies/spark-orientdb/blob/master/src/it/scala/org/apache/spark/orientdb/graphs/OrientDBGraphIntegrationSuite.scala#L1462-#L1510

Please let me know if you face further issues.

sbcd90 avatar Jul 25 '18 05:07 sbcd90