zed Add max_file_size restriction

Release Notes:

Added a max file size restriction to prevent performance issues or crashes when opening large files in Zed. If a user attempts to open a file exceeding the max_file_size setting (default limit: 10MB), a modal will display the message: "File size of X MB exceeds maximum allowed size: Y MB". This setting can be adjusted in bytes within the project settings.

Background:

In my experience with Zed, opening large files inadvertently has led to significant slowdowns and, in some cases, crashes, resulting in lost work.

Additional Notes:

Ideally, large files would be opened in read-only mode to mitigate memory consumption issues. However, given Zed's current architecture where files are fully loaded into memory, implementing this feature would require substantial changes. Therefore, introducing a file size restriction serves as an essential safeguard to prevent data loss and enhance user experience.

https://github.com/zed-industries/zed/assets/13990333/64090193-5317-41b6-bc2f-0f9cd82a9d1a

Mar 01 '24 09:03 murilopl

We require contributors to sign our Contributor License Agreement, and we don't have @murilopl on file. You can sign our CLA at https://zed.dev/cla. Once you've signed, post a comment here that says '@cla-bot check'.

Mar 01 '24 09:03 cla-bot[bot]

@cla-bot check

Mar 01 '24 09:03 murilopl

The cla-bot has been summoned, and re-checked this pull request!

Mar 01 '24 09:03 cla-bot[bot]

Hey, thanks for the PR! For what it's worth we've already profiled opening of large files internally; there was a recent PR that addressed some of the issues with loading them (https://github.com/zed-industries/zed/pull/7731); one other outstanding item is replacing buffer fingerprint with something that does not require hashing contents of the whole buffer. I am personally a bit torn on whether we should limit the file size by default.

Mar 01 '24 10:03 osiewicz

Hey guys, thanks for the comments! @SomeoneToIgnore I'm gonna review your suggestions asap.

@osiewicz I agree with you, users should be able to open any file they want. IntelliJ blocks to read only when you try to open files 20mb+.

I tried to use the same strategy here but since the problem is with 'smol::fs::load', didn't resolve.

My main concern is users suffering the problem I had, losing a ton of non saved work because accidentally opened a big file. Limiting a rare scenario like that, can prevent this without hurting too much the user experience while a better way is implemented. What you think?

Mar 01 '24 11:03 murilopl

I see what you're saying, but I don't think we should outright refuse to load a file that big. I think what @SomeoneToIgnore suggested makes sense; to have a popup that warns that you're about to open a big file. Then, the exact cutoff for the size of file can be modified with setting.

Mar 01 '24 11:03 osiewicz

@osiewicz agree Thanks for the feedback!

Mar 01 '24 11:03 murilopl

@osiewicz @SomeoneToIgnore I have some considerations regarding the modal implementation:

The modal should be blocking, meaning that the file should not be loaded into memory if the user chooses to heed the warning instead of ignoring it. To achieve this, I'm considering a couple of options:

Wrap the entire Project::open_path flow in an Option.
Create a validation function that checks and waits for the user's response to the notification before calling Project::open_path. This means that every place where open_path is called would need to call this validation function to handle the logic.

Do you have any suggestions for a better approach? I'm not entirely satisfied with these strategies yet.

Mar 02 '24 05:03 murilopl

I think there's no way around Project::open_path (or, maybe, workspace counterparts above, but that seems to change nothing) — good that there's a centralized place to call the pop-up in, after all.

But the changes in Project::open_path do not seem very global, unless I'm missing something?

It's return value now is Task<Result<(Option<ProjectEntryId>, AnyModel)>> and what we want to do now, is to spawn a new async thing (will work, as we return Task<Result<..) that will not return either ProjectEntryId or AnyModel, but is still not a failure. So, we need to introduce some enum and return it, smth. like

enum OpenedItem {
   ExternalFile(AnyModel),
   ProjectEntry(ProjectEntryId, AnyModel),
   PendingOpenPopUp,
}

fn open_path(...) -> Task<Result<OpenedItem>> ...

Naming is ad-hoc, but I would be interested to separate those Option<ProjectEntryId> things unless there are difficulties with this.

We would need to add inside the open_path a logic, that checks for the file size and returns a pop-up with a way to trigger the action: Weak<Workspace>, file path and other data that's needed to re-trigger the same open_path (or workspace.load_path?) when a button on a modal is clicked.

Inside the open_path, I think it's better to introduce a private function, that depends on some check_size: bool new parameter: we will use open_path with check_size: true on current opening of a local file via the workspace, use check_size:false in the deserialization function (it's the host that decided to open the file, as the client we can only obey) and in the new logic, that will spawn the modal pop-up inside the open_path function.

Hope that helped, but please feel free to ask for more clarifications if needed.

Mar 02 '24 08:03 SomeoneToIgnore

Thanks for the clarification. The enum solution seem great! I having a issue working with the thread/context flow in this logic: https://github.com/zed-industries/zed/blob/6fcd57ac53cd73adb717532c1f76b132a998e8c8/crates/project/src/project.rs#L1669

The challenge is integrating the file size check and modal dispatch into the existing async flow. Ideally, I need an async context to validate the file size and handle the modal logic before open_buffer, but this requires a separate thread to wait for the modal response before proceeding with buffer loading. However, moving open_buffer inside the spawn is not feasible due to its dependency on ModelContext, and I can't await the modal flow before creating the open_buffer task without an async join.

Here's an illustrative example of what I'm aiming for:

    pub fn open_path(
        &mut self,
        path: ProjectPath,
        cx: &mut ModelContext<Self>,
    ) -> Task<Result<OpenedItem>> {
        // Inside the `self.validate_path` contain the entire logic of checking the file size, opening the modal etc...
        // This is a entire async flow that I need to wait before going to call the self.open_buffer
        if !self.validate_path(path, cx).await? {
            return Ok(OpenedItem::PendingOpenPopUp);
        }

        let task = self.open_buffer(path.clone(), cx);
        cx.spawn(move |_, cx| async move {
            let buffer = task.await?;
            let project_entry_id = buffer.read_with(&cx, |buffer, cx| {
                File::from_dyn(buffer.file()).and_then(|file| file.project_entry_id(cx))
            })?;

            let buffer: &AnyModel = &buffer;
            Ok(OpenedItem::ProjectEntry(project_entry_id, buffer.clone()))
        })
    }

Mar 02 '24 09:03 murilopl

gpui::Task is somewhat special and may act as a Rust Future (not running unless .await'ed) but not in this case definitely: it's the sync code that starts opening the file that runs first there.

But cx.spawn for view-related cx should have a weak model to its related element:

diff --git a/crates/project/src/project.rs b/crates/project/src/project.rs
index 45aca64a5..fa9b11bf2 100644
--- a/crates/project/src/project.rs
+++ b/crates/project/src/project.rs
@@ -1665,8 +1665,8 @@ impl Project {
         path: ProjectPath,
         cx: &mut ModelContext<Self>,
     ) -> Task<Result<(Option<ProjectEntryId>, AnyModel)>> {
-        let task = self.open_buffer(path.clone(), cx);
-        cx.spawn(move |_, cx| async move {
+        cx.spawn(move |project, mut cx| async move {
+            let task = project.update(&mut cx, |project, cx| project.open_buffer(path, cx))?;
             let buffer = task.await?;
             let project_entry_id = buffer.read_with(&cx, |buffer, cx| {
                 File::from_dyn(buffer.file()).and_then(|file| file.project_entry_id(cx))

we need to do .update(...)? where ? just bails if the weak modal ref is not upgradeable into a strong one. and inside, we can run the sync code that gets the task, we return it out of the sync code and .await.

This way, both the new checking code and the old async one is now in the same, new task from cx.spawn

Mar 02 '24 10:03 SomeoneToIgnore

Quite a lot to change here still, so will close it for now as stale — feel free to reopen and/or discuss it further.

Mar 25 '24 07:03 SomeoneToIgnore

zed zed copied to clipboard

Add max_file_size restriction

zed
zed copied to clipboard