engine icon indicating copy to clipboard operation
engine copied to clipboard

Eliminates Dart->Host response copy

Open gaaclarke opened this issue 3 years ago • 28 comments

This makes platform_channel_basic_binary_2host_1MB go from 529.2 µs to 412.4 µs (22% improvement) in my local testing.

The test is in the platform channel benchmarks in the flutter repo.

I found this optimization when exploring asset loading, so it should be beneficial there as well by eliminating a copy.

WARNING: Do not merge until https://github.com/flutter/flutter/pull/105982 merges in the framework.

local testing results

Executed on iOS device.

before

BasicMessageChannel/StandardMessageCodec/Flutter->Host/Small: 56.8 µs
BasicMessageChannel/StandardMessageCodec/Flutter->Host/Large: 684.4 µs
BasicMessageChannel/BinaryCodec/Flutter->Host/Large: 49.7 µs
BasicMessageChannel/BinaryCodec/Flutter->Host/1MB: 529.2 µs
BasicMessageChannel/StandardMessageCodec/Flutter->Host/SmallParallel3: 26.1 µs
BasicMessageChannel/StandardMessageCodec/Flutter->Host (background)/Small: 39.6 µs
BasicMessageChannel/StandardMessageCodec/Flutter->Host (background)/SmallParallel3: 22.3 µs

after

BasicMessageChannel/StandardMessageCodec/Flutter->Host/Small: 54.4 µs
BasicMessageChannel/StandardMessageCodec/Flutter->Host/Large: 685.3 µs
BasicMessageChannel/BinaryCodec/Flutter->Host/Large: 50.1 µs
BasicMessageChannel/BinaryCodec/Flutter->Host/1MB: 412.4 µs
BasicMessageChannel/StandardMessageCodec/Flutter->Host/SmallParallel3: 25.3 µs
BasicMessageChannel/StandardMessageCodec/Flutter->Host (background)/Small: 39.3 µs
BasicMessageChannel/StandardMessageCodec/Flutter->Host (background)/SmallParallel3: 22.4 µs

Pre-launch Checklist

  • [x] I read the Contributor Guide and followed the process outlined there for submitting PRs.
  • [x] I read the Tree Hygiene wiki page, which explains my responsibilities.
  • [x] I read and followed the Flutter Style Guide and the C++, Objective-C, Java style guides.
  • [ ] I listed at least one issue that this PR fixes in the description above.
  • [ ] I added new tests to check the change I am making or feature I am adding, or Hixie said the PR is test-exempt. See testing the engine for instructions on writing and running engine tests.
  • [x] I updated/added relevant documentation (doc comments with ///).
  • [x] I signed the CLA.
  • [x] All existing and new tests are passing.

If you need help, consider asking for advice on the #hackers-new channel on Discord.

gaaclarke avatar Jun 13 '22 23:06 gaaclarke

It looks like this pull request may not have tests. Please make sure to add tests before merging. If you need an exemption to this rule, contact Hixie on the #hackers channel in Chat (don't just cc him here, he won't see it! He's on Discord!).

If you are not sure if you need tests, consider this rule of thumb: the purpose of a test is to make sure someone doesn't accidentally revert the fix. Ask yourself, is there anything in your PR that you feel it is important we not accidentally revert back to how it was before your fix?

Reviewers: Read the Tree Hygiene page and make sure this patch meets those guidelines before LGTMing.

flutter-dashboard[bot] avatar Jun 14 '22 00:06 flutter-dashboard[bot]

I'm not sure if this case requires a test exemption since it has a test but it exists in the other repo.

gaaclarke avatar Jun 14 '22 00:06 gaaclarke

For asset loading the time change for a 1MB payload is: 543.29 µs to 339.95 µs (37%)

Benchmark: https://github.com/flutter/flutter/pull/105982

gaaclarke avatar Jun 14 '22 22:06 gaaclarke

I'm totally convinced that not doing the copy is faster and also saves memory. But I'm not sure if this is going to cause segfaults in cases where a PROT_READ only mmaped buffer response is written to from Dart and also doesn't cause threading violations in cases APKAssetMappings.

I was looking into how to verify that this is safe and the guidance to reproduce the contrived use-case would be to duplicate EmbedderTest.PlatformMessagesCanReceiveResponse. On the native side, create an mmaped buffer that is write protected. On the Dart side, attempt to write to it. If we segfault, that would mean we need a migration. If not, we need to figure out how we can write to a write protected buffer without copies. Then hunt down those copies. Either way, I feel we'd need a different API and a migration.

chinmaygarde avatar Jun 14 '22 22:06 chinmaygarde

I had a chat with @jonahwilliams and @dnfield. We decided:

  1. Having a mutable ByteBuffer that crashes when you write to it is unacceptable
  2. Having a private channel that elides the copy and we control all the end points isn't good enough since channels are not ever really private and embedders will have to know to follow the convention of not writing to the ByteBuffer that isn't enforced beyond crashing
  3. Implementing copy-on-write semantics for the ByteBuffer will probably not work because that would introduce an extra layer of indirection, thus harming performance in other ways. Dart could probably implement that but immutable data would be sufficient too.
  4. We should definitely elide the copy in cases where we are sending over read-write memory (like platform channels)
  5. We should land the benchmark for the asset loader
  6. We should audit what is actually being sent with "flutter/assets" in customer: money to potentially build a case for prioritizing the work on the Dart VM.

gaaclarke avatar Jun 14 '22 23:06 gaaclarke

Hey, can't the operating system do copy-on-write with mmap? https://pythonspeed.com/articles/reduce-memory-array-copies/ This would be something to investigate.

gaaclarke avatar Jun 14 '22 23:06 gaaclarke

Hey, can't the operating system do copy-on-write with mmap? https://pythonspeed.com/articles/reduce-memory-array-copies/ This would be something to investigate.

Looks like we are already doing copy-on-write (https://github.com/gaaclarke/engine/blob/a236ed5db78c665eb66de540a4fcda57c7862c71/fml/platform/posix/mapping_posix.cc#L75:L75), so this probably isn't a crasher. I'll have to write the test that Chinmay is suggesting tomorrow. Potentially we can remove all copies except apk assets which are not threadsafe.

gaaclarke avatar Jun 14 '22 23:06 gaaclarke

I was able to verify that copy-on-write works, and it's not what we are currently doing. I had to apply the following patch:

diff --git a/assets/directory_asset_bundle.cc b/assets/directory_asset_bundle.cc
index 7896756614..c0d0e0d844 100644
--- a/assets/directory_asset_bundle.cc
+++ b/assets/directory_asset_bundle.cc
@@ -50,8 +50,12 @@ std::unique_ptr<fml::Mapping> DirectoryAssetBundle::GetAsMapping(
     return nullptr;
   }
 
-  auto mapping = std::make_unique<fml::FileMapping>(fml::OpenFile(
-      descriptor_, asset_name.c_str(), false, fml::FilePermission::kRead));
+  auto mapping = std::make_unique<fml::FileMapping>(
+      fml::OpenFile(descriptor_, asset_name.c_str(), false,
+                    fml::FilePermission::kRead),
+      std::initializer_list<fml::FileMapping::Protection>(
+          {fml::FileMapping::Protection::kRead,
+           fml::FileMapping::Protection::kWrite}));
 
   if (!mapping->IsValid()) {
     return nullptr;
diff --git a/fml/platform/posix/mapping_posix.cc b/fml/platform/posix/mapping_posix.cc
index aff3d645d3..52c878e4f7 100644
--- a/fml/platform/posix/mapping_posix.cc
+++ b/fml/platform/posix/mapping_posix.cc
@@ -72,7 +72,7 @@ FileMapping::FileMapping(const fml::UniqueFD& handle,
 
   auto* mapping =
       ::mmap(nullptr, stat_buffer.st_size, ToPosixProtectionFlags(protection),
-             is_writable ? MAP_SHARED : MAP_PRIVATE, handle.get(), 0);
+             MAP_PRIVATE, handle.get(), 0);
 
   if (mapping == MAP_FAILED) {
     return;

I propose we:

  1. Make FileMapping more expressive to signify we want copy-on-write by adding an extra flag to its constructor
  2. Make DirectoryAssetBundle start returning copy-on-write file mappings
  3. Add a new method to Mapping uint8_t* GetReadWriteMapping(). For in-memory mappings and copy-on-write mmap file mappings, that return a value. For other mappings it will return a nullptr.
  4. Make the dart response code copy the data if GetReadWriteMapping() returns nullptr before handing it to Dart (matching current behavior)

That way we can pass ownership of the read-write memory to Dart and a memcpy will only happen when one is required.

gaaclarke avatar Jun 15 '22 18:06 gaaclarke

I filed a bug for the fuchsia failure: https://bugs.fuchsia.dev/p/fuchsia/issues/detail?id=102798

Since it works in darwin and android it seems to be a fuchsia bug. We can cut them out of this change for now.

gaaclarke avatar Jun 15 '22 23:06 gaaclarke

Okay, I'm putting this back into review. I've bubbled up the GetMutableBytes from FileMapping to Mapping and in the cases where we have access to the mutable bytes, I pass ownership to Dart. We are able to get mutable bytes from FileMappings now because we are loading them with copy-on-write semantics (I've added a test to show that works). The following situations will no longer have the extra copies:

  1. Platform message responses on iOS and Android
  2. PlatformAssetBundle loading on iOS, Android, macOS, Linux

I disabled the feature on Fuchsia because there appears to be a bug in their mmap implementation and I've disabled the feature for windows since I tried to implement it but I lack the setup to debug my attempt for now.

gaaclarke avatar Jun 16 '22 17:06 gaaclarke

Sorry, pulling this down from review yet again. While it does functionally what I wanted, I've lost the performance benefit to platform channels on iOS when compared to previous versions. I need to investigate what happened. I'm seeing 514µs for platform_channel_basic_binary_2host_1MB which is a lot slower than earlier versions.

gaaclarke avatar Jun 16 '22 20:06 gaaclarke

Okay, I fixed it. It turns out that [NSMutableData dataWithBytes:length:] is 100µs slower than [NSData dataWithBytes:length:]. I'm not sure why, but that unfortunately means we have to use a const_cast to get the speed boost. That is safe since we are creating those data buffers in memory and essentially passing them off via unique_ptr. Objective-C doesn't provide tools to assert that statically unfortunately though.

gaaclarke avatar Jun 16 '22 20:06 gaaclarke

Dart can't resize bytedata, but the other points stand

dnfield avatar Jun 16 '22 21:06 dnfield

I agree with https://github.com/flutter/engine/pull/34018#pullrequestreview-1009730927. Returning a Dart ByteData backed by a copy-on-write file mapping will produce unexpected behavior.

Is there a specific application scenario that this is targeting?

If the goal is to improve the performance of platform messages sent by host applications, then it might be worthwhile to avoid an extra copy if the message is backed by a malloced buffer or std::vector.

If the issue is related to loading assets, then we probably need a separate Dart API that represents a read-only mapping (as mentioned above).

jason-simmons avatar Jun 16 '22 22:06 jason-simmons

Thanks for taking a look @chinmaygarde @dnfield @jason-simmons . I'll try to respond to all the feedback. There are 3 optimizations here: assets, platform channels on ios and platform channels on android. No one has a problem with the android platform channels, I'll split my feedback between the optimizations.

Assets

If the size of the buffer is changed from Dart code, there is no way to resize the underlying mapping.

As Dan pointed out, there is no API to do that. It couldn't be safely done by the Dart VM either since it doesn't know how the memory was created.

For files in read-only storage, you simply cannot get mappings that are read-write. You will get a MAP_FAILED in these circumstances. This is probably what is happening on Fuchsia and Windows. I expected the same thing to happen on iOS too since the assets are in read-only storage too. Perhaps this is in debug mode where the files are synced to writable storage? Not sure. This probably doesn't work on iOS in release mode.

I think there is a misunderstanding about how mmap works with copy-on-write. The flag that creates copy-on-write semantics is MAP_PRIVATE which means all the modifications to the data are private to the process, the files on disk are not modified so the file permissions don't matter (more info). The unit test I've provided asserts this by loading the contents in memory again after the memory is edited. The way this works is the OS will switch a VM page from looking at disk to looking at a page that is specific to the process (probably another region of RAM). This is possible on Windows too, see the documentation on FILE_COPY_MAP. It's just a notoriously finicky API and the engine requires a lot of effort getting development setup on Windows, that's why I didn't tackle it up yet.

On platforms where you can get a mutable mapping (desktops perhaps), applications can now change the files on disk because they can write to mappings they get from platform message responses.

This isn't the case because of the previous point.

iOS Platform Channels

You are relying on dodgy pointer const casts and the implementation of NSData on iOS.

The reason here is because it is mostly* safe. In most cases those NSData objects are ephemeral data blobs that we are creating (example). Unfortunately objc lacks the the ability to statically assert that we have one unique owner of the NSData, but that would be the case. When you have one unique owner, reinterpretting const memory to non-const memory is safe.

*) I say mostly safe because there is one case where it could be problematic now that I think of it. If a user is using a BinaryCodec and responds to a message with NSData, then Dart modifies that data, then it is read again in in objc that would be a problem.

This could be solved by querying the Codec if it copies the data when responding and inserting a copy if it isn't.

If you have concerns beyond that we can address them with tests.

Android Platform Channels

If the goal is to improve the performance of platform messages sent by host applications, then it might be worthwhile to avoid an extra copy if the message is backed by a malloced buffer or std::vector.

That's what's happening in the Android case, we can statically assert we have one owner of the read-write data so we are good here.

gaaclarke avatar Jun 17 '22 17:06 gaaclarke

Here's local testing results of the platform channel benchmarks for Android on a Pixel 4a

before

I/flutter (17630): BasicMessageChannel/StandardMessageCodec/Flutter->Host/Small: 361.6 µs
I/flutter (17630): BasicMessageChannel/StandardMessageCodec/Flutter->Host/Large: 2303.7 µs
I/flutter (17630): BasicMessageChannel/BinaryCodec/Flutter->Host/Large: 142.4 µs
I/flutter (17630): BasicMessageChannel/BinaryCodec/Flutter->Host/1MB: 1483.9 µs
I/flutter (17630): BasicMessageChannel/StandardMessageCodec/Flutter->Host/SmallParallel3: 86.1 µs
I/flutter (17630): BasicMessageChannel/StandardMessageCodec/Flutter->Host (background)/Small: 134.4 µs
I/flutter (17630): BasicMessageChannel/StandardMessageCodec/Flutter->Host (background)/SmallParallel3: 79.8 µs

after

I/flutter (12863): BasicMessageChannel/StandardMessageCodec/Flutter->Host/Small: 414.1 µs
I/flutter (12863): BasicMessageChannel/StandardMessageCodec/Flutter->Host/Large: 1979.8 µs
I/flutter (12863): BasicMessageChannel/BinaryCodec/Flutter->Host/Large: 119.2 µs
I/flutter (12863): BasicMessageChannel/BinaryCodec/Flutter->Host/1MB: 1024.4 µs
I/flutter (12863): BasicMessageChannel/StandardMessageCodec/Flutter->Host/SmallParallel3: 88.3 µs
I/flutter (12863): BasicMessageChannel/StandardMessageCodec/Flutter->Host (background)/Small: 135.5 µs
I/flutter (12863): BasicMessageChannel/StandardMessageCodec/Flutter->Host (background)/SmallParallel3: 82.1 µs

Looks like it is slightly slower for small payloads and 30% faster for 1MB payloads.

gaaclarke avatar Jun 17 '22 18:06 gaaclarke

I've removed the iOS implementation since it will take a bit more work to get it working safely with the BinaryCodec (or custom codecs). I'll land it in a separate PR.

gaaclarke avatar Jun 17 '22 18:06 gaaclarke

Is it beneficial to trade off the performance of small messages for larger ones?

jonahwilliams avatar Jun 17 '22 18:06 jonahwilliams

Is it beneficial to trade off the performance of small messages for larger ones?

The previous implementation was already conditionalizing how data was being treated based on its size for performance reasons: https://github.com/gaaclarke/engine/blob/a01f69234e9cc0d21d4a972e4e22bb19737e00fe/third_party/tonic/typed_data/dart_byte_data.cc#L26:L26

I can duplicate that logic here.

(sorry, i edited your comment on accident instead of replying)

gaaclarke avatar Jun 17 '22 18:06 gaaclarke

@jonahwilliams I duplicated that logic in the latest patch and here are the results for local testing (on Android pixel 4a):

I/flutter (24598): BasicMessageChannel/StandardMessageCodec/Flutter->Host/Small: 301.4 µs
I/flutter (24598): BasicMessageChannel/StandardMessageCodec/Flutter->Host/Large: 1949.6 µs
I/flutter (24598): BasicMessageChannel/BinaryCodec/Flutter->Host/Large: 124.1 µs
I/flutter (24598): BasicMessageChannel/BinaryCodec/Flutter->Host/1MB: 1055.0 µs
I/flutter (24598): BasicMessageChannel/StandardMessageCodec/Flutter->Host/SmallParallel3: 84.7 µs
I/flutter (24598): BasicMessageChannel/StandardMessageCodec/Flutter->Host (background)/Small: 195.6 µs
I/flutter (24598): BasicMessageChannel/StandardMessageCodec/Flutter->Host (background)/SmallParallel3: 127.3 µs

gaaclarke avatar Jun 17 '22 18:06 gaaclarke

@chinmaygarde friendly ping since my response was right before a long weekend.

gaaclarke avatar Jun 21 '22 23:06 gaaclarke

Chat summary

@chinmay and I had a discussion about this issue last night on Discord. Here was the takeaways, feel free to correct me if I'm misconstruing something.

  • It would be better if Dart could represent const data, because that already matches the fml::Mapping interface. However that has yet to be implemented and is lacking a design to specify how we'd deal with changing the Codec APIs to accommodate it.
  • Right now we are doing the least-common denominator where we copy the data unconditionally. We need to conditionally optimize away the copy. Even if Dart could represent const data, for the case of AAssets they'll still need to be copied since they aren't allowed to be passed between threads.
  • A big point of friction is around modifying the fml::Mapping API. IsDontNeedSafe() is seen as a mistake and we don't want that to happen again. The concern is that when/if dart implements const data representation and we migrate to using that, will could be left with a vestigial API on fml::Mapping that is difficult to remove.
  • Conditionally optimizing away the copy is added complexity, but the gains in performance merit added complexity. (20% improvement in 1MB android platform channel round trip, 30% on iOS asset loading 1MB)

Options to move forward

Here's some options I've thought of that address the concerns.

Mapping->Dart Conversion

Promoting FlutterMapping::GetMutableMapping() is kind of a big change to fml::Mapping. Someone could easily latch onto it and use it for some purpose that isn't moving the fml::Mapping to Dart. That means if we migrate to a Dart const data representation we could end up with a method that would be difficult to remove. Instead of adding GetMutableMapping() we could add a context specific method that couldn't be used for a different purpose, a Dart_Handle MoveToByteBuffer() call.

class Mapping {
  // Move the Mapping to be wrapped in a thread-safe Dart 
  // ByteBuffer.  If this call is successful, calling 
  // "GetMapping()" afterwards will return nullptr.
  virtual std::optional<Dart_Handle> MoveToByteBuffer() =0;
};

The benefit of this is that this method cannot be used for a purpose beyond what we need it for and thus making it easier to modify in the future. It could be converted to Dart_Handle MoveToConstByteBuffer() in the future.

edit: We could also make the function always return something, incurring a copy if it has to. Putting that logic in the Mapping instead of in the handler.

Visitor

If we are concerned about modifying the Mapping API we can add the functionality we need with the Visitor design pattern.

class MappingVisitor {
  virtual void Visit(FileMapping* mapping) =0;
  virtual void Visit(MallocMapping* mapping) =0;
  //...
};
class Mapping {
  virtual void Visit(MappingVisitor* visitor) =0;
};

That way we can write the logic to decide if we can transfer ownership to Dart inside of a Visitor. The downside to this is there are subclasses of fml::Mapping that are only compiled for certain embedders. We'd have to declare them in the core engine so the Visitor can reference them (I'm not sure if we can use forward declarations). This also has a slight performance cost, performing double dispatch instead of single dispatch.

Migrate to PlatformMessageResponse to MallocMapping

Migrating the response class to using MallocMapping would allow us to remove the copy in certain cases without modifying the fml::Mapping interface. One downside is that it would only easily be available to the Android Platform Channel's case since that already has a MallocMappping from the jni layer. It would never be able to pass ownership of mmap'd data to Dart and would unlikely to ever be compatible with iOS (without some trickery).

class PlatformMessageResponse {
  virtual void Complete(std::unique_ptr<fml::MallocMapping> data) =0;
};

The major downside of this though is that it may actually introduce an extra copy in the case where the payload is small so we are performing a copy when passing the data to Dart and the payload wasn't already in a MallocMapping. For example, iOS Platform Channels whose response payload is less than 1000 bytes.

gaaclarke avatar Jun 23 '22 19:06 gaaclarke

One other thought that I didn't communicate well. The data we are actually sending here in a response is one of 4 types:

  1. Ephemeral data that we can prove at compile time the messenger has unique ownership of (ex MallocMapping)
  2. A memory representation of a file (ex mmap)
  3. A non-threadsafe memory representation of a file (ex AAssets)
  4. Data that we can't statically assert is uniquely owned by the messenger. (ex NSDataMapping)

Only type 4 requires a Dart representation that is immutable. It also represents a fraction of the actual usage. Right now it is iOS platform channel usage but I think we can use convention and tests to capture most of iOS's use cases as well by proving that the messenger has unique ownership.

gaaclarke avatar Jun 24 '22 16:06 gaaclarke

Here is some data supporting the importance of this change. (@zanderso @jonahwilliams @jiahaog )

Microbenchmarks Summary

Here is a reminder of the microbenchmarks results:

  • For iOS asset loading the time change for a 1MB payload is: 543.29 µs to 339.95 µs (37%)
  • For Android Platform Channels 1MB round trip time: 1483.9 µs to 1055.0 µs (29%)

Customer Money

Here is some data that shows the potential gain for customer money.

Sqflite latency breakdown

In b/237035671 customer money asks why the wall time for loading data from sqflite takes 17ms when the actual execution time of java handler takes 1ms.

The reasons why are:

  1. The latency when scheduling on the platform thread (could be removed with https://github.com/tekartik/sqflite/issues/719)
  2. The latency when scheduling on sqlflite's thread (probably not huge)
  3. The latency when scheduling on the platform thread again (I just removed this in a PR in sqflite: https://github.com/tekartik/sqflite/pull/825)
  4. The latency when scheduling the response on the ui thread (This probably contributes the most)
  5. The payload size of the request
  6. The payload size of the response (The province of this PR if the response is >1MB)

I've asked them measure the size of their responses to understand what portion of the 16ms is the response copy.

Platform channel audit

We do have data from customer money that gives us understanding about how platform channels are being used since the savings from this PR are a function of the size of the response payloads.

Since this improvement is gated on a payload greater than 1000 bytes, here are the platforms channels reported by customer money that are communicating over 1000 bytes on the "downstream" (in the direction of Host->Flutter):

channel down_bytes
plugins.flutter.io/<redacted>#updateConfig 27733
plugins.flutter.io/device_info#getAndroidDeviceInfo 5588
com.tekartik.sqflite#query 1366

When we hear back from customer money about the individual sizes we can make sure some of these payloads are over 1000 bytes.

Asset loading at startup

I also audited asset loading on customer money's prototype app on iOS and these are the assets getting loaded at startup and their size (remember anything over 1000 bytes would receive an improvement):

asset size (bytes)
asset/ic_us_tab_bar_minus_one_light.flr 2972
asset/ic_us_tab_bar_zero_light.flr 2770
asset/ic_us_tab_bar_plus_one_light.flr 3816
asset/network/home_backsplash_us.svg.vg 31718
asset/img_logo_<redacted>.svg.vg 4126
asset/network/homescreen_empty_state_business_v2.svg.vg 2964
asset/network/homescreen_empty_state_people_default_v2.svg.vg 4412
asset/bg_invite_billboard_us.svg.vg 154
asset/img_invite_billboard_illustration_us.svg.vg 11077
asset/toolbar/toolbar_footer_light.svg.vg 69
asset/toolbar/toolbar_footer_light.svg.vg 69
asset/my_money/link_to_stored_value_logo.svg.vg 816
asset/entity/container/img_search_icon.svg.vg 1413
asset/network/remittance_discover_illustration_v3.svg.vg 5470
asset/network/us_businesses_vertical_gas.svg.vg 4628
asset/network/us_businesses_vertical_parking.svg.vg 4178
asset/network/us_businesses_vertical_transit.svg.vg 2596

We cannot easily say how much time this PR would save the app at startup, or generalize for every possible app because it depends on how parallelized the code is and the system is. We'll get a better idea if we can get them to test the PR.

gaaclarke avatar Jun 24 '22 21:06 gaaclarke

@jiahaog may know the most up-to-date way to try out this patch. Suggest following up offline.

zanderso avatar Jun 24 '22 21:06 zanderso

I've audited the time spent performing this copy of data in customer money's prototype app on iOS at startup as well. There were 20 instances of the copy called with a total time of 7ms on an iPhone SE 2 (this is the copy that is removed in this PR for iOS assets): Screen Shot 2022-06-24 at 2 51 36 PM

profile patch
--- a/third_party/tonic/typed_data/dart_byte_data.cc
+++ b/third_party/tonic/typed_data/dart_byte_data.cc
@@ -8,6 +8,18 @@
 
 #include "tonic/logging/dart_error.h"
 
+#include <os/signpost.h>
+
+static os_log_t points_of_interest_logger = NULL;
+
+static os_log_t GetPointsOfInterestLogger() {
+  if (!points_of_interest_logger) {
+    points_of_interest_logger =
+        os_log_create("dev.flutter", OS_LOG_CATEGORY_POINTS_OF_INTEREST);
+  }
+  return points_of_interest_logger;
+}
+
 namespace tonic {
 
 namespace {
@@ -28,9 +40,17 @@ Dart_Handle DartByteData::Create(const void* data, size_t length) {
     // The destructor should release the typed data.
     return handle;
   } else {
+    static uint64_t counter = 1;
+    static uint64_t identifier = counter++;
+    if (__builtin_available(macOS 10.14, iOS 12.0, *)) {
+      os_signpost_interval_begin(GetPointsOfInterestLogger(), identifier, "DartByteData memcpy");
+    }
     void* buf = ::malloc(length);
     TONIC_DCHECK(buf);
     ::memcpy(buf, data, length);
+    if (__builtin_available(macOS 10.14, iOS 12.0, *)) {
+      os_signpost_interval_end(GetPointsOfInterestLogger(), identifier, "DartByteData memcpy");
+    }
     return Dart_NewExternalTypedDataWithFinalizer(
         Dart_TypedData_kByteData, buf, length, buf, length, FreeFinalizer);
   }

gaaclarke avatar Jun 24 '22 21:06 gaaclarke

@chinmaygarde friendly ping. This is caught up on your feedback. I responded to it with some options to move forward. Can you please capture where your head is at now. I'm back from vacation now and can fill in any gaps, but it seems like discussion on his PR usually comes back to people trying to interpret where your thoughts are now.

FWIW @zanderso told me that const buffers in Dart should be coming this quarter so keep that in mind that there seems to be movement there.

gaaclarke avatar Jul 22 '22 16:07 gaaclarke

Rebased to help @jiahaog's testing.

gaaclarke avatar Jul 27 '22 20:07 gaaclarke

Update: The Dart team believes they may have a way to present a const bytebuffer with little runtime performance. Previously they thought the cost to reading and writing would make it not worth it because of some global polymophism optimization that is currently happening.

We got the latests results from customer: money on startup time. This copy occupies 0.9% (9ms) of the ui thread at app startup on Android. We don't have an official measurement for iOS and it would be very different since it is eliminating asset loading copies.

gaaclarke avatar Aug 15 '22 16:08 gaaclarke

The unmodifiable bytes change for Dart has landed: https://github.com/dart-lang/sdk/commit/c1e67ac84f7d862aee14bbb9ce09346cef21dfa0

I haven't had a chance to hook it up. This looks like we can make it a non-breaking change compile-wise since UnmodifiableByteDataView implements ByteData (https://github.com/dart-lang/sdk/commit/c1e67ac84f7d862aee14bbb9ce09346cef21dfa0#diff-ff924e00bfbb53afcdf9b21700dd244df23776793371f29db6c46c47b8987b0fR5487). It will still technically be a breaking change at runtime (that I suspect no one will run into). I've been asked by the Dart team to check out the performance ramifications too.

gaaclarke avatar Aug 16 '22 23:08 gaaclarke