rethinkdb icon indicating copy to clipboard operation
rethinkdb copied to clipboard

Add SmartOS support

Open jperkin opened this issue 10 years ago • 33 comments

This is a set of commits that add support for SmartOS.

  • 63bcfc2 changes file_t to myfile_t, file_t is exposed from sys/file.h on SmartOS.
  • 7c4d34a changes lock_t to mylock_t, lock_t is exposed from sys/machtypes.h on SmartOS.
  • 5702f34 is a set of general portability fixes, required on SmartOS but should be generally applicable and help other ports.
  • e826138 adds the core of the SmartOS support, notably an event ports backend for the event queue and timers.
  • f257229 isn't great, hence marked XXX, but is currently required as std::to_string isn't available even in C++11 mode with GCC 4.7 on SmartOS. Looking around this seems to be a problem with other operating systems too. Ideas for handling this better would be welcome.

With these changes applied I am able to run a server in both 32-bit and 64-bit builds. It should be broadly compatible with other illumos distributions and Oracle Solaris, but those have not been verified.

Running the unittest suite on 32-bit results in:

[==========] 287 tests from 63 test cases ran. (779139 ms total)
[  PASSED  ] 285 tests.
[  FAILED  ] 2 tests, listed below:
[  FAILED  ] UtilsTest.IPAddress
[  FAILED  ] RDBBtree.SindexEraseRange

The RDBBtree.SindexEraseRange failure looks very similar to #3805:

[ RUN      ] RDBBtree.SindexEraseRange
src/unittest/rdb_btree.cc:253: Failure
Value of: groups->size()
  Actual: 0
Expected: 1
src/unittest/rdb_btree.cc:253: Failure
Value of: groups->size()
  Actual: 0
Expected: 1
[  FAILED  ] RDBBtree.SindexEraseRange (11438 ms)

The UtilsTest.IPAddress failure is:

[ RUN      ] UtilsTest.IPAddress
src/unittest/utils_test.cc:134: Failure
Value of: ip_strings.find("::ffff:203.0.113.59") != ip_strings.end()
  Actual: false
Expected: true
[  FAILED  ] UtilsTest.IPAddress (2 ms)

For this test ip_strings only contains 203.0.113.59, so it may just need a change to handle the getaddrinfo() handling on SmartOS.

When running the test suite against the 64-bit build, I see intermittent failures in the Clustering tests with heartbeat timeouts, but haven't tracked those down yet, otherwise same results as the 32-bit build. Running the standalone server seems to work fine.

I have no real interest in retaining copyright and signing the CLA, so you can consider these changes as being in the public domain and are welcome to take them as your own.

jperkin avatar May 29 '15 13:05 jperkin

[  PASSED  ] 285 tests.

Awesome! Thanks for doing this.

I have no real interest in retaining copyright and signing the CLA

I hope that doesn't stop us from merging this.

At first glance the code looks good, modulo some cosmetic changes.

AtnNn avatar May 29 '15 14:05 AtnNn

Hi @jperkin,

Thanks for your work in porting RethinkDB to SmartOS. The changes look great, except for some minor nit-picks:

  • I think the only reason why system_mutex_t::lock_t is conflicting with the system-defined lock_t is because of the friend class lock_t; declaration in arch/io/concurrency.hpp. That declaration is redundant anyway, so I'd rather you removed the declaration than rename lock_t.
  • I don't really like the name myfile_t. Perhaps rdb_file_t would be better? I'm happy to hear other ideas.
  • I suggest defining the std::to_string() workaround only once (perhaps in stl_utils.hpp) instead of in five different places.
  • In the existing epoll_event_queue_t, when epoll_wait() fails with EINTR we retry rather than aborting the program. The man page indicates that port_getn() can also fail with EINTR, so that should probably be handled in the same way.
  • evport_event_queue_t::forget_resource() should probably check the return value of port_disassociate().

(@atnnn - What cosmetic changes were you thinking of?)

timmaxw avatar May 29 '15 17:05 timmaxw

Thanks for the feedback. I've updated the pull request to incorporate these suggestions, and have also made a few fixes to the event port backend which seems to have resolved the issues I was seeing with the 64-bit build.

Test results are now:

  • 32-bit
[----------] Global test environment tear-down
[==========] 285 tests from 63 test cases ran. (790482 ms total)
[  PASSED  ] 282 tests.
[  FAILED  ] 3 tests, listed below:
[  FAILED  ] RDBBtree.SindexPostConstruct
[  FAILED  ] RDBBtree.SindexEraseRange
[  FAILED  ] UtilsTest.IPAddress
  • 64-bit
[----------] Global test environment tear-down
[==========] 285 tests from 63 test cases ran. (908458 ms total)
[  PASSED  ] 283 tests.
[  FAILED  ] 2 tests, listed below:
[  FAILED  ] UtilsTest.IPAddress
[  FAILED  ] RDBBtree.SindexPostConstruct

This is with my changes applied to the most recent v2.0.x branch.

jperkin avatar Jun 01 '15 14:06 jperkin

The changes look good to me. @atnnn, what do you think?

timmaxw avatar Jun 01 '15 17:06 timmaxw

@jperkin This is phenomenal. Assigning @AtnNn for a final decision.

@mglukhovsky Is this fine?

I have no real interest in retaining copyright and signing the CLA, so you can consider these changes as being in the public domain and are welcome to take them as your own.

danielmewes avatar Jun 01 '15 22:06 danielmewes

@jperkin -- are you completely opposed to signing the CLA (http://rethinkdb.com/community/cla/)? It's all online and only takes a few seconds. Unfortunately if we don't follow this process, we'll have to clear it with legal, which is doable but complicated. (We really appreciate the patch and want to merge it in, but bureaucracy demands to be served)

coffeemug avatar Jun 01 '15 22:06 coffeemug

Just to clarify, @jperkin works with me at Joyent -- and I think he was just trying to be expedient, not take a position against the CLA. If signing the CLA is the path to expediency, we can definitely make that happen...

bcantrill avatar Jun 02 '15 01:06 bcantrill

Thanks @bcantrill -- the CLA is definitely the most expedient path (but not the only one, if it's unworkable for you guys). FYI, all it does is signs off the copyright to these specific patches to RethinkDB. We included a lot of info in the CLA document on why that's necessary for us to operate effectively -- sorry if it causes too many inconveniences.

coffeemug avatar Jun 02 '15 01:06 coffeemug

OK, not a problem, you should have received a signed CLA from me now.

jperkin avatar Jun 02 '15 06:06 jperkin

Got it, thanks! @danielmewes @AtnNn -- feel free to merge if/when it's appropriate.

coffeemug avatar Jun 02 '15 17:06 coffeemug

I'm going to test it myself as soon as I get a SmartOS VM running.

AtnNn avatar Jun 02 '15 20:06 AtnNn

I found the cause of the missing std::to_string in joyent/pkgsrc#270. You can just ignore those patches now as pkgsrc releases from 2015Q2 onwards will include them correctly.

jperkin avatar Jun 27 '15 16:06 jperkin

@AtnNn Any particular reason why this has not been merged yet ? It's been almost 1 year... I would very much like to run RethinkDB on SmartOS.

tlvenn avatar Mar 24 '16 16:03 tlvenn

Thanks for the bump @tlvenn.

I think we had some issues getting the VM set up the last time we tried. We should pick this up again soon.

However note that we might not have the resources to fully test and support this even after merging into the main branch.

danielmewes avatar Mar 24 '16 17:03 danielmewes

Joyent is happy to provide SmartOS infrastructure for this. Please create an account for RethinkDB and then mail me ([email protected]) or DM me (@bcantrill) your username and we'll get it taken care of!

bcantrill avatar Mar 24 '16 21:03 bcantrill

Thanks @bcantrill, that's very helpful. I'll email you shortly.

danielmewes avatar Mar 25 '16 20:03 danielmewes

Hi @danielmewes , any progress you can update us with please ? Thanks !

tlvenn avatar Apr 08 '16 02:04 tlvenn

I would like to see it merged as it will open possibilities to add more platforms.

krytarowski avatar Apr 17 '16 20:04 krytarowski

I want to add NetBSD support.

krytarowski avatar Apr 17 '16 20:04 krytarowski

@bcantrill @dalanmiller any update ?

It's hard to believe Jonathan went through so much effort and it has been wasted for almost a year already...

tlvenn avatar Apr 18 '16 04:04 tlvenn

@tlvenn Sorry for the delayed reply, I was out of office for the past week. We don't have any updates on the testing yet, but I'll try to follow up shortly.

danielmewes avatar Apr 18 '16 04:04 danielmewes

I merged current next with this in branch daniel_jperkin_smartos_4309. Will try to test it asap.

danielmewes avatar May 12 '16 20:05 danielmewes

Thanks for the update @danielmewes !

tlvenn avatar May 13 '16 06:05 tlvenn

There's currently a compilation issue with V8. Looking into work-arounds...

danielmewes avatar May 13 '16 18:05 danielmewes

I'm at a bit of a loss here.

The final linking step for V8 fails as follows:

g++ -pthread  -o /root/rethinkdb/build/external/v8_3.30.33.16-patched/build/out/x64.release/mksnapshot -Wl,--start-group /root/rethinkdb/build/external/v8_3.30.33.16-patched/build/out/x64.release/obj.host/mksnapshot/src/mksnapshot.o /root/rethinkdb/build/external/v8_3.30.33.16-patched/build/out/x64.release/obj.host/tools/gyp/libv8_base.a /root/rethinkdb/build/external/v8_3.30.33.16-patched/build/out/x64.release/obj.host/tools/gyp/libv8_nosnapshot.a /root/rethinkdb/build/external/v8_3.30.33.16-patched/build/out/x64.release/obj.host/tools/gyp/libv8_libplatform.a /root/rethinkdb/build/external/v8_3.30.33.16-patched/build/out/x64.release/obj.host/third_party/icu/libicui18n.a /root/rethinkdb/build/external/v8_3.30.33.16-patched/build/out/x64.release/obj.host/third_party/icu/libicuuc.a /root/rethinkdb/build/external/v8_3.30.33.16-patched/build/out/x64.release/obj.host/tools/gyp/libv8_libbase.a /root/rethinkdb/build/external/v8_3.30.33.16-patched/build/out/x64.release/obj.host/third_party/icu/libicudata.a -Wl,--end-group -lnsl
ld: fatal: file /root/rethinkdb/build/external/v8_3.30.33.16-patched/build/out/x64.release/obj.host/tools/gyp/libv8_base.a: unknown file type
ld: fatal: file processing errors. No output written to /root/rethinkdb/build/external/v8_3.30.33.16-patched/build/out/x64.release/mksnapshot

Note that /root/rethinkdb/build/external/v8_3.30.33.16-patched/build/out/x64.release/obj.host/tools/gyp/libv8_base.a does exist, and

ar -t /root/rethinkdb/build/external/v8_3.30.33.16-patched/build/out/x64.release/obj.host/tools/gyp/libv8_base.a

reads it just fine. Still the GNU linker doesn't accept it. If I remove that input from the command, it just prints the same error for the next .a file in the list.

The only report of a similar issue I could find was http://forums.mozillazine.org/viewtopic.php?f=42&t=344719 , which has no solution.

danielmewes avatar Aug 06 '16 00:08 danielmewes

An interesting thing is that file just identifies it as data, while on my Linux system it says current at archive for a similar file. Maybe the magic for this file type is just missing?

danielmewes avatar Aug 06 '16 00:08 danielmewes

Ok made some progress. Looks like the problem is that the V8 build system creates what's called a "thin" archive, which gld then doesn't accept (not sure if that's because of the missing file magic, or due to a different reason). http://stackoverflow.com/questions/25554621/turn-thin-archive-into-normal-one shows how to convert the .a files.

danielmewes avatar Aug 06 '16 00:08 danielmewes

V8 still doesn't link, though now with a different issue:

ld: warning: file /root/rethinkdb/build/external/v8_3.30.33.16-patched/build/out/x64.release/obj.host/third_party/icu/libicui18n.a(calendar.o): wrong ELF class: ELFCLASS32
ld: warning: file /root/rethinkdb/build/external/v8_3.30.33.16-patched/build/out/x64.release/obj.host/third_party/icu/libicuuc.a(brkiter.o): wrong ELF class: ELFCLASS32

@jperkin Do you remember how you got V8 to work back when you started porting RethinkDB?

I think back then we might have been accepting a system-installed version of libv8, which probably made this easier. Right now we no longer support that.

danielmewes avatar Aug 06 '16 01:08 danielmewes

This might be related to https://github.com/joyent/pkgsrc-joyent/blob/master/rethinkdb/patches/patch-mk_support_pkg_v8.sh, or something similar where it is trying to e.g. build a 64-bit binary where the GCC default is 32-bit, or vice-versa. That's usually the cause of ELFCLASS failures.

I also ensured that we used the system (pkgsrc) icu rather than building inline, which required this patch https://github.com/joyent/pkgsrc-joyent/blob/master/rethinkdb/patches/patch-external_v8__3.30.33.16_third__party_icu_icu.gyp

Both of these are specific to the pkgsrc environment though and aren't general purpose.

jperkin avatar Aug 08 '16 15:08 jperkin

@srh What do you think about this? @jperkin Do you still want to have this change?

gabor-boros avatar May 17 '20 08:05 gabor-boros