janusgraph icon indicating copy to clipboard operation
janusgraph copied to clipboard

Update TinkerPop 3.6.1

Open farodin91 opened this issue 3 years ago • 1 comments

Signed-off-by: Jan Jansen [email protected]


Thank you for contributing to JanusGraph!

In order to streamline the review of the contribution we ask you to ensure the following steps have been taken:

For all changes:

  • [ ] Is there an issue associated with this PR? Is it referenced in the commit message?
  • [ ] Does your PR body contain #xyz where xyz is the issue number you are trying to resolve?
  • [ ] Has your PR been rebased against the latest commit within the target branch (typically master)?
  • [ ] Is your initial contribution a single, squashed commit?

For code changes:

  • [ ] Have you written and/or updated unit tests to verify your changes?
  • [ ] If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under ASF 2.0?
  • [ ] If applicable, have you updated the LICENSE.txt file, including the main LICENSE.txt file in the root of this repository?
  • [ ] If applicable, have you updated the NOTICE.txt file, including the main NOTICE.txt file found in the root of this repository?

For documentation related changes:

  • [ ] Have you ensured that format looks appropriate for the output in which it is rendered?

farodin91 avatar Sep 15 '22 16:09 farodin91

I will split this update into multiple PRs.

farodin91 avatar Sep 19 '22 17:09 farodin91

@porunov Hbase upgrade is a pain.

farodin91 avatar Oct 01 '22 12:10 farodin91

@porunov Hbase upgrade is a pain.

I imagine it is ...

porunov avatar Oct 01 '22 12:10 porunov

@porunov @FlorianHockmann @li-boxuan @rngcntr Does any have a bit time to look into hbase hadoop tests?

farodin91 avatar Oct 03 '22 07:10 farodin91

It looks like this one of the bugs which stoping me to get this running fine. https://github.com/apache/hbase/pull/4819/files

Other tests in apache projects were deactivated in the combination to test hbase 2 with hadoop 3. https://github.com/apache/ranger

farodin91 avatar Oct 09 '22 14:10 farodin91

@farodin91

TP tests are skipped

It would be nice to run it by adding [tp-tests] into commit message

mad avatar Oct 12 '22 13:10 mad

@mad I would like to find a way to fix the hbase test before hand.

farodin91 avatar Oct 12 '22 17:10 farodin91

@farodin91

Some info about hbase issue

Scan for janusgraph hbase return some data, but metadata say no data exists

Scan response

hbase:003:0> scan 'janusgraph', {LIMIT => 10}
ROW                                                 COLUMN+CELL                                                                                                                                            
 \x00\x00\x00\x00\x00\x00\x00\x03                   column=i:\xFF\xFF\xFF\xFF\xFF\xFE\xC7\x7F\x00\x00\x01\x83\xD0\xEB\xE0\x807f000101691244-bic-pc1, timestamp=2022-10-13T10:37:42.913, value=             
 \x00\x00\x00\x00\x00\x00\x00\x04                   column=i:\xFF\xFF\xFF\xFF\xFF\xFF\xFF\x9B\x00\x00\x01\x83\xD0\xEB\xDF(7f000101691244-bic-pc1, timestamp=2022-10-13T10:37:42.569, value=                
 \x00\x00\x00\x00\x00\x00\x00\x04                   column=i:\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xCD\x00\x00\x01\x83\xD0\xEB\xDD\xCA7f000101691244-bic-pc1, timestamp=2022-10-13T10:37:42.219, value=             
 \x00\x00\x00\x00\x00\x00\x02\x0D                   column=e:\x02, timestamp=2022-10-13T10:37:44.127, value=\x00\x01\x08\x80                                                                               
 \x00\x00\x00\x00\x00\x00\x02\x0D                   column=e:\x10\xC0, timestamp=2022-10-13T10:37:44.127, value=\xA0vl\x1EvertexKe\xF9\x04\x80                                                             
 \x00\x00\x00\x00\x00\x00\x02\x0D                   column=e:\x10\xC2\x80\x14\x00, timestamp=2022-10-13T10:37:44.127, value=\x8F\x00\x01\x8E\x00\x8F\x80                                                   
 \x00\x00\x00\x00\x00\x00\x02\x0D                   column=e:\x10\xC2\x80\x18\x00, timestamp=2022-10-13T10:37:44.127, value=\x8F\x00\x01\x8E\x00\x90\x80                                                   
 \x00\x00\x00\x00\x00\x00\x02\x0D                   column=e:\x10\xC4, timestamp=2022-10-13T10:37:44.127, value=\x00\x82\x0C\x80                                                                           
 \x00\x00\x00\x00\x00\x00\x02\x0D                   column=e:\x10\xC8, timestamp=2022-10-13T10:37:44.127, value=\x00\x80\x00\x01\x83\xD0\xEB\xE1\xFF\x10\x80                                               
 \x18\xD4{\x96\x10\xA5\xA0vl\x1EvertexKe\xF9        column=g:\x00, timestamp=2022-10-13T10:37:44.127, value=\x04\x8D                                                                                       
 configuration                                      column=s:graph.janusgraph-version, timestamp=2022-10-13T10:37:42.043, value=\x92\xA01.0.0-SNAPSHO\xD4                                                  
 configuration                                      column=s:graph.storage-version, timestamp=2022-10-13T10:37:42.047, value=\x92\xA0\xB2                                                                  
 configuration                                      column=s:graph.timestamps, timestamp=2022-10-13T10:37:42.033, value=\xB6\x82                                                                           
 configuration                                      column=s:hidden.frozen, timestamp=2022-10-13T10:37:42.049, value=\x8F\x01                                                                              
 configuration                                      column=s:ids.num-partitions, timestamp=2022-10-13T10:37:42.035, value=\x8C\x82                                                                         
 configuration                                      column=s:storage.drop-on-clear, timestamp=2022-10-13T10:37:42.039, value=\x8F\x00                                                                      
 configuration                                      column=s:system-registration.7f000101691244-bic-pc1.startup-time, timestamp=2022-10-13T10:37:42.153, value=\xC1\x80\x00\x00\x00cG\xEAv\x01\x11ta\x80   
 \x88\x00\x00\x00\x00\x00\x00\x00                   column=i:\xFF\xFF\xFF\xFF\xFF\xFF\xD8\xEF\x00\x00\x01\x83\xD0\xEB\xE2\x077f000101691244-bic-pc1, timestamp=2022-10-13T10:37:43.303, value=             
 \x88\x00\x00\x00\x00\x00\x00\x03                   column=i:\xFF\xFF\xFF\xFF\xFF\xFE\xC7\x7F\x00\x00\x01\x83\xD0\xEB\xE3c7f000101691244-bic-pc1, timestamp=2022-10-13T10:37:43.651, value=                
 \x88\x00\x00\x00\x00\x00\x00\x80                   column=e:\x02, timestamp=2022-10-13T10:37:44.001, value=\x00\x01\x04\x91                                                                               
 \x88\x00\x00\x00\x00\x00\x00\x80                   column=e:$, timestamp=2022-10-13T10:37:44.001, value=\x04\x8D\x08\x91\xFF                                                                              
 \xFA<[T\x11\xA5\x82                                column=g:\x00\x04\x8D\x0C\x80, timestamp=2022-10-13T10:37:44.127, value=\x04\x8D                                                                       
9 row(s)
Took 0.0498 seconds                                    

Metadata response

hbase:004:0> scan 'hbase:meta', {FILTER=>"PrefixFilter('janusgraph')", COLUMNS=>['info:regioninfo']}
ROW                                                 COLUMN+CELL                                                                                                                                            
 janusgraph,,1665657444679.32f32af13dd119840a3d0b3c column=info:regioninfo, timestamp=2022-10-13T10:37:41.600, value={ENCODED => 32f32af13dd119840a3d0b3c75e01e99, NAME => 'janusgraph,,1665657444679.32f32
 75e01e99.                                          af13dd119840a3d0b3c75e01e99.', STARTKEY => '', ENDKEY => ''}                                                                                           
1 row(s)

STARTKEY and ENDKEY are empties.

So, that lead to empty inputSplits here org.janusgraph.hadoop.formats.hbase.HBaseBinaryInputFormat#getSplits

mad avatar Oct 13 '22 10:10 mad

@mad any idea why metadata is empty?

farodin91 avatar Oct 13 '22 11:10 farodin91

@farodin91

Actually root cause is spark https://spark.apache.org/docs/3.2.0/core-migration-guide.html#upgrading-from-core-31-to-32

Since Spark 3.2, spark.hadoopRDD.ignoreEmptySplits is set to true by default which means Spark will not create empty partitions for empty input splits. To restore the behavior before Spark 3.2, you can set spark.hadoopRDD.ignoreEmptySplits to false.

So, just put spark.hadoopRDD.ignoreEmptySplits=false to hbase-read.properties and hbase-read-snapshot.properties

mad avatar Oct 13 '22 12:10 mad

@mad Thank you.

farodin91 avatar Oct 14 '22 06:10 farodin91

@mad Updated

farodin91 avatar Oct 14 '22 14:10 farodin91

@farodin91

3.6 has some breaking changes https://github.com/apache/tinkerpop/blob/master/CHANGELOG.asciidoc - (breaking) mark

Do we need to take this into account?

mad avatar Oct 17 '22 08:10 mad

@mad All breaking changes, i've handled multiple breaking changes. https://issues.apache.org/jira/browse/TINKERPOP-2507 and gryo removal.

farodin91 avatar Oct 18 '22 07:10 farodin91

@porunov Would you like to review it again?

farodin91 avatar Oct 20 '22 16:10 farodin91