godot icon indicating copy to clipboard operation
godot copied to clipboard

Fix slow editor load on large projects (v2)

Open Hilderin opened this issue 1 year ago • 22 comments

  • Fixes #94917

This PR includes the modifications of #95672 and the modifications in core/io/file_access.cpp from #93064. This PR includes more modifications then #95672 and more possibilities of side effects. That's why I created a different PR.

The main problem while loading a project is the parsing and reading of all the .import and .md5 files. On a project of 60000 files, it means 120000 files. Worst was that each .import file was read 3 times in EditorFileSystem::_test_for_reimport because ResourceFormatImporter::_get_path_and_type was called internally 2 times at least. Once in ResourceFormatImporter::are_import_settings_valid and another time in ResourceImporterLayeredTexture::are_import_settings_valid while calling ResourceFormatImporter::get_singleton()->get_resource_metadata(p_path)

I did these modifications in EditorFileSystem::_test_for_reimport:

  • Removed the if (!FileAccess::exists(p_path + ".import")), usually, the file should exists and there's a FileAccess::open on the next line that returns an error if the file does not exists.
  • The metadata is read directly instead of needing to call ResourceFormatImporter::get_singleton()->get_resource_metadata(p_path) which re-read the .import file
  • are_import_settings_valid is called directly on the importer with the metadata preventing a call to ResourceFormatImporter::get_singleton()->are_import_settings_valid which re-read the .import file

I tested a project with all the images from https://github.com/PMDCollab/SpriteCollab/tree/master/sprite which contains 61332 png.

Before this PR:

  • Loading the project in debug: 1m50sec
  • Loading the project in 4.3 stable: 1m40sec

After this PR:

  • Loading the project in debug: 50sec
  • Loading the project in production build and optimize=speed lto=full: 30sec

Note: I got some variations when testing on the loading times. If I restart the editor quickly after another, the loading time is a bit faster. My guess is that some files are in the disk cache the second time around.

Hilderin avatar Aug 17 '24 02:08 Hilderin

15 seconds to open the whole tps demo forge ( that icludes the time to enable the cronometer and then going to godot to open the project (ON ANDROID s23+) , though i noticed that while super fast now scan after reloading got stuck at 89 sometimes or numbers like that before reopening the scene. Also this is an artifact build so prob the end result could be faster ? ( not sure if artifacts are optimized..)

Meanwhile on 4.3 stable it 25 seconds upon reloading or opening project.

Edit: Considering this is now GAS GAS GAS level of speed, it might be possible to speedup the transition from the godot logo to the editor as that's also gets kind of affected on bigger projects.

Also tried the whole sprite collab project (master, because why not)

1 minute for the godot logo transition to the editor

8:30 minutes for the whole scan of the project

And then got frozen minutes and closed it , because it took too long( though even the file manage was like this taking over 15 minutes just to extract it.

Saul2022 avatar Aug 17 '24 04:08 Saul2022

Can confirm that this makes the rescan from e.g. renaming a file in a large project (100.000+ files) significantly faster, like 10000% faster.

Maybe unrelated to this PR but what is still super slow after this is the very first time editor start every day. For some reason it loads all the resource files again instead of just checking the meta data or md5. E.g. all the mesh assets in the project get reloaded from disk. Since the project has TB of mesh data this takes many minutes. Basically my old issue https://github.com/godotengine/godot/issues/47053

smix8 avatar Aug 17 '24 09:08 smix8

though i noticed that while super fast now scan after reloading got stuck at 89 sometimes or numbers like that before reopening the scene

That should be fixed by #93064, the issue is caused by the fact that the editor freezes after the file scan and the progress bar got stuck at the position where is was at the last frame.

Hilderin avatar Aug 17 '24 10:08 Hilderin

Maybe unrelated to this PR but what is still super slow after this is the very first time editor start every day. For some reason it loads all the resource files again instead of just checking the meta data or md5. E.g. all the mesh assets in the project get reloaded from disk. Since the project has TB of mesh data this takes many minutes. Basically my old issue https://github.com/godotengine/godot/issues/47053

Unfortunately, I was never able to reproduce this problem on my side even if I tried on different OS and project :(

Hilderin avatar Aug 17 '24 10:08 Hilderin

, the issue is caused by the fact that the editor freezes after the file scan.

Alright i remembered when i tested it for other issue and that was fixed so it’s fine.

Aside from it and maybe a bit off topic, you have any idea on how to improving further the loading times in eg project with such massive data like the whole project ( instead of just the sprites) linked that had to wait 8 minutes just to scan every file and then it keeps not responding( and there wasn’t any scene that had to be reopened, so that’s because of the files.

Saul2022 avatar Aug 17 '24 17:08 Saul2022

I'm not sure what linked project you are referring? I was not able to access the linked project from the original issue. As for the editor freeze, the PR #93064 should fix that. The first time a project is opened, it has to scan for files and after that it needs to analyze what to do with them. Currently, without #93064, there's no progress on this analysis process and with a lot of files and specially on Android, it could take a while. This PR also has some improvements on performance during this analysis phase but I would not expect to reduce dramatically the first loading time. I'm not really sure to understand correctly the problem without more details. Maybe create another issue with more details on your machine, your project, etc....?

Hilderin avatar Aug 17 '24 17:08 Hilderin

I'm not sure what linked project you are referring? I was not able to access the linked project from the original issue.

I mean with the https://github.com/PMDCollab/SpriteCollab/tree/master Whole project downloading the zip file,

As for the editor freeze, the PR #93064 should fix that.

Sorry for confusion though i kind of confirmed it did with the popup dialog

Maybe create another issue with more details on your machine, your project, etc....?

Might do it when this and the dock v2 prs are merged to have the window, because it still kind of weird that after scanning completely all files it takes so long to respond after the full scan when there’s no scene to do anything, just image files on the file system , maybe that just related to the import system making adjustments for every single file.

Saul2022 avatar Aug 17 '24 18:08 Saul2022

Might do it when this and the dock v2 prs are merged to have the window, because it still kind of weird that after scanning completely all files it takes so long to respond after the full scan when there’s no scene to do anything, just image files on the file system , maybe that just related to the import system making adjustments for every single file.

Perfect. I also think it's because it's processing every file. It should be easier to pinpoint the problem once these PRs are merged. Thanks.

Hilderin avatar Aug 17 '24 18:08 Hilderin

Perfect. I also think it's because it's processing every file. It should be easier to pinpoint the problem once these PRs are merged. Thanks.

No problem i have seen many people complain for a long time on godot bad losdong times on big projects with lots of gbs, either seen people experience freezes, editor quitting itself or crash after huge load times. As for processing every file , maybe as all this images uses the same sethings( because they all are pngs or jpegs) just have one check to change all the image properties at once without having to call it individually for them, not sure how it might work.

Saul2022 avatar Aug 17 '24 20:08 Saul2022

This is really outstanding work, but there is something that is worrying me in general. Technically, EditorFileSystem is designed so that those files (.import or anything else) are only parsed on first editor load and then never again, since the file date and the information contained should be cached.

Basically, the expected behavior is that the second time you open the editor, the filesystem cache file is read, all files are simply checked for date if modified and if not nothing is done.

Still, I've seen reports or comments from users that even renaming a file is causing massive amount of work, so I have the feeling that this caching system has broken at some point and contributors kind of assume that "slow" is the way it should be expected to work.

I would really suggest investigating that the caching system is working to ensure we are not just band-aiding around a more fundamental bug (even if, of course, this optimization is still super welcome).

reduz avatar Aug 18 '24 07:08 reduz

, EditorFileSystem is designed so that those files (.import or anything else) are only parsed on first editor load and then never again, since the file date and the information contained should be cached.

It kind of does no? Since on second load scan and opening stuff is faster than on the first. For example first time on trying to import sponza ivy glrf ( over270 mb ) it took 18:3 seconds to load the whole editor while on second time it took 12-11 seconds after reloading the editor on the stable version, being the thing that holds like 6 seconds on both times the godot logo transition to the editor

Saul2022 avatar Aug 18 '24 08:08 Saul2022

Basically, the expected behavior is that the second time you open the editor, the filesystem cache file is read, all files are simply checked for date if modified and if not nothing is done.

That makes a lot more sense! Thanks for your feedback. I'll check that, I have a couple of ideas! Probably in another PR just to not mess with this one and to be careful.

Hilderin avatar Aug 18 '24 11:08 Hilderin

After some investigation, the main issue comes from calling _test_for_reimport on each asset file, which reads the .import and md5 files, etc. This method was added in commit bb83c7d6b7c11c92c6c6bf3a151b3f0749d7c230 to fix issue #13135.

If I understand the original issue correctly, the logic should be:

  • If the modification date changed:
    • If the hash changed:
      • Reimport

Currently, the logic is:

  • If the modification date changed AND the hash has not changed:
    • Reimport

From what I understand, the original issue seems to point out that sometimes the last modification date could be different even if the file was not modified, which is probably the case in issue #47053, and we have confirmation of this on some Android devices (#94416). In that case, comparing the hashes could prevent a useless asset reimportation. The only missing part is a way to check if the content of the .import file is different to prevent issue #60730.

Is my understanding correct?

Hilderin avatar Aug 18 '24 12:08 Hilderin

@Hilderin Sorry I can't type much as I am on the phone. I think the idea is that the import file is only reparsed if the date changed. Then for imported assets, first date is checked and if it fails, then md5 is checked. Ultimately if both fail a reimport happens.

reduz avatar Aug 19 '24 09:08 reduz

I finally decided to implement all the modifications in this PR. Following @reduz suggestion, I investigated more deeply why we needed to read all these files on startup.

Issues that I found:

  • All the .import files were always read, even if the last modified date did not change, when the project setting editor/import/reimport_missing_imported_files is true, which is its default value.
  • If editor/import/reimport_missing_imported_files is true, it was necessary to read the .import file to obtain the destination files of the import to validate if they still exist.
  • In 2017, when the original MD5 was added, the method _test_for_reimport did not check the importer, importation settings, or MD5 while scanning. Only when the last modification date of the asset file or the .import file were different.
  • Over the years, with each commit, additional steps were added in the scan process, such as reading the MD5 file, reading the importer from the resource type to check the importer version, and validating the import settings during the first scan, which gradually slowed down the scan.

What I suggest in this PR:

  • I added the MD5 of the .import file to the filesystem_cache8. This makes it easy to determine if the importation settings have changed and removes the dependence on the last modified date, which can be unreliable. This should fix issues #47053 and #94416 while still prevent issue #60730.
  • I added the destination files to the filesystem_cache8, removing the need to read the .import file at all during editor startup.
  • I divided the _test_for_reimport function into two distinct functions to clarify the process and prevent future errors. The first, _is_test_for_reimport_needed, is used only during the scan process, and I kept _test_for_reimport without the parameter p_only_imported_files, which is used when the last modification dates are different to verify more deeply if a reimportation is needed by validating the MD5, importer, importer settings, etc.

The algorithm now works like this:

  • Load the last filesystem_cache8 if present.
  • Scan process (first scan and scan for changes):
    • For each file, add the asset for an ACTION_FILE_TEST_REIMPORT:
      • If the last modification date of the asset file has changed
      • Or if the .import file is missing
      • Or if the last modification date of the .import file has changed
      • Or if editor/import/reimport_missing_imported_files is true and at least one of the destination files is missing
  • Process each file action ACTION_FILE_TEST_REIMPORT (in _update_scan_actions):
    • Add the file for reimportation:
      • If the .import file is missing
      • Or if the MD5 for the .import file is different
      • Or if the importer name in the .import file does not exist
      • Or if the importer version is greater
      • Or if the importer settings are invalid
      • Or if the MD5 of the source asset file is different
      • Or if the MD5 of the destination files is different
  • Reimport each file marked for reimportation.
  • Save the filesystem_cache8.

Time of the scans:

  • Before this PR: 1 min+
  • After these modifications: 8-9 sec.

Hilderin avatar Aug 21 '24 01:08 Hilderin

I finally decided to implement all the modifications in this PR. Following @reduz suggestion, I investigated more deeply why we needed to read all these files on startup.

Wouldn't it be better to separate it on another pr so it easier to merge this one first as is safer and then take the turbo speedup one for another.

Time of the scans:

  • Before this PR: 1 min+
  • After these modifications: 8-9 sec.

Can confirm , literal on a project with 5k files total ( https://github.com/user-attachments/files/16075644/test_base_mesh.zip ) the file scan is instant getting to 100 % just after the logo transition ( which is already pretty fast now) the scene load the last what holding back, but that's for a future pr. Great job!

Saul2022 avatar Aug 21 '24 02:08 Saul2022

Wouldn't it be better to separate it on another pr so it easier to merge this one first as is safer and then take the turbo speedup one for another.

I'm not sure. I was thinking that it's easier to test everything at once and will prevent conflicts if the first PR needs changes. Also, we're not in a hurry; this problem has been present for a long time. I can gladly split this PR if the reviewers think it's better.

Hilderin avatar Aug 21 '24 09:08 Hilderin

I think this single-PR approach is preferable in this case; no real reason to have a standalone PR with just a bandaid solution only to immediately follow it with the actual fix. Either way, the changes affect core pretty heavily, so they'd be subject to thorough review/scrutiny regardless.

Repiteo avatar Aug 21 '24 17:08 Repiteo

I think this single-PR approach is preferable in this case; no real reason to have a standalone PR with just a bandaid solution only to immediately follow it with the actual fix.

Alright then , i am just excited to have it merged and possibly combine witht he dock pr to speed even further the scan speed and editor feedback.

Also for pushing the file system scan further i increased tps files to over 39.87gb and over 35k files ( .godot included) and now whole load is 33.09 seconds( decent jump when with 2.5 gb it was 13 secs) Still pushing well for a phone, but curious on why smix gdscript takes less than 2 seconds to scan his project with tons of gbs..

Edit : Also for some reasson making scene quit and load project from manager, loads faster than project reload in editor , and after this the project now loads in 23 seconds..

Saul2022 avatar Aug 22 '24 05:08 Saul2022

Rebased to fix conflict with master.

Hilderin avatar Sep 09 '24 21:09 Hilderin

Rebased to fix merge conflicts.

Hilderin avatar Sep 20 '24 15:09 Hilderin

I confirm as well. The fix is working good on real project with 4.5gb of assets (about 15k files). Thank you!

H2xDev avatar Sep 23 '24 08:09 H2xDev

Could it be cherrypicked for 4.3.x?

H2xDev avatar Sep 25 '24 11:09 H2xDev

Thanks! Awesome work :tada:

akien-mga avatar Sep 26 '24 10:09 akien-mga

Lurker, but my project has 100k assets and this will greatly improve my workflow. Thank you for this! Awesome work indeed!!

fgheorghe avatar Sep 26 '24 11:09 fgheorghe

Victory!!!!!!!! Retty helpful pr and fast.

Saul2022 avatar Sep 27 '24 03:09 Saul2022

It looks like the new _is_test_for_reimport_needed doesn't consider assets that has an editor variant (Introduced in https://github.com/godotengine/godot/pull/64938). ResourceFormatImporter::get_singleton()->are_import_settings_valid(path) are not called when editor theme settings changed.

hayahane avatar Jun 04 '25 08:06 hayahane