truffleruby icon indicating copy to clipboard operation
truffleruby copied to clipboard

Ensure symlinked realpaths are not loaded twice

Open rwstauner opened this issue 2 years ago • 4 comments

closes #3138

Use a hash to cache realpath lookups and another to cache realpath -> feature.

This is based off of what I was reading in the CRuby code: https://github.com/ruby/ruby/blob/499eb3990faeaac2603787f2a41b2d9625e180dc/load.c#L358-L402 https://github.com/ruby/ruby/blob/499eb3990faeaac2603787f2a41b2d9625e180dc/load.c#L1195-L1289 but I could easily be wrong about something here.

Let me know if you see problems with the approach or a better way to do something here.

I did have a few questions remaining: Should the body of the load_unless_realpath_loaded method be synchronized? Does it make sense to wrap it in with_synchronized_features?

There was a commit in CRuby: https://github.com/ruby/ruby/pull/4887/commits/972f2744d8145db965f1c4218313bd200ea0a740#:~:text=To%20avoid%20concurrency%20issues%20when%20rebuilding%20the%20loaded%20features%0Aindex%2C%20the%20building%20of%20the%20index%20itself%20is%20left%20alone%2C%20and%0Aafterwards%2C%20a%20separate%20loop%20is%20done%20on%20a%20copy%20of%20the%20loaded%20feature%0Asnapshot%20in%20order%20to%20rebuild%20the%20realpaths%20hash.

To avoid concurrency issues when rebuilding the loaded features index, the building of the index itself is left alone, and afterwards, a separate loop is done on a copy of the loaded feature snapshot in order to rebuild the realpaths hash.

Do we not need to worry about that here because it's already being called inside with_synchronized_features? Or would that still be a concern?

rwstauner avatar Jul 27 '23 17:07 rwstauner

Design-wise that change looks pretty hacky to me. The simplest for TruffleRuby would be to realpath when adding to $LOADED_FEATURES. But that may not be compatible enough, unsure.

eregon avatar Jul 28 '23 16:07 eregon

If https://github.com/oracle/truffleruby/issues/3138 is not really an issue in practice I would probably delay this. It seems pretty messy to do correctly. I'd rather we convince CRuby to realpath all LOADED_FEATURES eagerly or do that and accept the small semantics difference (it's much simpler and likely more efficient).

eregon avatar Jul 28 '23 16:07 eregon

The simplest for TruffleRuby would be to realpath when adding to $LOADED_FEATURES. But that may not be compatible enough, unsure.

In my testing it seemed that the first time the realpath was stored it was saved to LOADED_FEATURES as whatever was actually requested. Then if that realpath was requested again, it would be ignored.

I tried to follow what the ruby C code was doing, but it's probably more complicated than what I have so far here.

I'd rather we convince CRuby to realpath all LOADED_FEATURES eagerly or do that and accept the small semantics difference (it's much simpler and likely more efficient).

Might be worth a discussion :+1:

rwstauner avatar Aug 07 '23 17:08 rwstauner

Action item for me here is I need to file a CRuby issue to suggest using realpath for everything, it would be so much simpler and consistent, and have that discussed in a dev meeting. In parallel we could try to implement the semantics of "everything realpath" in TruffleRuby and see if that's compatible enough. In either case I think this is too risky to integrate for the 23.1 release.

Also would be useful to get an answer to https://github.com/oracle/truffleruby/issues/3138#issuecomment-1655985383, cc @nirvdrum

eregon avatar Aug 16 '23 12:08 eregon