Filtered particles lead to many repetitive reads of files
Take the following example:
import yt
@yt.particle_filter(filtered_type="io")
def DM_lores(pfilter, data):
return data[pfilter.filtered_type, "particle_mass"].to("code_mass").d > 3.1e-6
ds = yt.load_sample("output_00080")
ds.add_particle_filter("DM_lores")
yt.set_log_level(10) # to see the IO footprint
ds.r["DM_lores", "particle_position"]
This will lead to each file being read twice, once to filter on particle mass, and a second time to obtain the positions. This is suboptimal, since all the reading could be done in one pass.
diff --git a/yt/frontends/ramses/io.py b/yt/frontends/ramses/io.py
index 6f241631a..0ba375075 100644
--- a/yt/frontends/ramses/io.py
+++ b/yt/frontends/ramses/io.py
@@ -91,6 +91,8 @@ def _ramses_particle_binary_file_handler(particle_handler, subset, fields, count
ds = subset.domain.ds
foffsets = particle_handler.field_offsets
fname = particle_handler.fname
+ fields = list(fields)
+ mylog.debug("Reading %s: %s", fname, fields)
data_types = particle_handler.field_types
with FortranFile(fname) as fd:
# We do *all* conversion into boxlen here.
Logs
yt : [DEBUG ] 2025-08-04 11:21:25,431 Reading /home/XXX/Documents/prog/yt-data/output_00080/part_00080.out00001: [('io', 'particle_mass'), ('io', 'particle_position_x'), ('io', 'particle_position_y'), ('io', 'particle_position_z')]
yt : [DEBUG ] 2025-08-04 11:21:25,437 Reading /home/XXX/Documents/prog/yt-data/output_00080/part_00080.out00002: [('io', 'particle_mass'), ('io', 'particle_position_x'), ('io', 'particle_position_y'), ('io', 'particle_position_z')]
yt : [DEBUG ] 2025-08-04 11:21:25,439 Reading /home/XXX/Documents/prog/yt-data/output_00080/part_00080.out00003: [('io', 'particle_mass'), ('io', 'particle_position_x'), ('io', 'particle_position_y'), ('io', 'particle_position_z')]
yt : [DEBUG ] 2025-08-04 11:21:25,440 Reading /home/XXX/Documents/prog/yt-data/output_00080/part_00080.out00004: [('io', 'particle_mass'), ('io', 'particle_position_x'), ('io', 'particle_position_y'), ('io', 'particle_position_z')]
yt : [DEBUG ] 2025-08-04 11:21:25,441 Reading /home/XXX/Documents/prog/yt-data/output_00080/part_00080.out00005: [('io', 'particle_mass'), ('io', 'particle_position_x'), ('io', 'particle_position_y'), ('io', 'particle_position_z')]
yt : [DEBUG ] 2025-08-04 11:21:25,447 Reading /home/XXX/Documents/prog/yt-data/output_00080/part_00080.out00006: [('io', 'particle_mass'), ('io', 'particle_position_x'), ('io', 'particle_position_y'), ('io', 'particle_position_z')]
yt : [DEBUG ] 2025-08-04 11:21:25,447 Reading /home/XXX/Documents/prog/yt-data/output_00080/part_00080.out00007: [('io', 'particle_mass'), ('io', 'particle_position_x'), ('io', 'particle_position_y'), ('io', 'particle_position_z')]
yt : [DEBUG ] 2025-08-04 11:21:25,447 Reading /home/XXX/Documents/prog/yt-data/output_00080/part_00080.out00008: [('io', 'particle_mass'), ('io', 'particle_position_x'), ('io', 'particle_position_y'), ('io', 'particle_position_z')]
yt : [DEBUG ] 2025-08-04 11:21:25,447 Reading /home/XXX/Documents/prog/yt-data/output_00080/part_00080.out00009: [('io', 'particle_mass'), ('io', 'particle_position_x'), ('io', 'particle_position_y'), ('io', 'particle_position_z')]
yt : [DEBUG ] 2025-08-04 11:21:25,448 Reading /home/XXX/Documents/prog/yt-data/output_00080/part_00080.out00010: [('io', 'particle_mass'), ('io', 'particle_position_x'), ('io', 'particle_position_y'), ('io', 'particle_position_z')]
yt : [DEBUG ] 2025-08-04 11:21:25,448 Reading /home/XXX/Documents/prog/yt-data/output_00080/part_00080.out00011: [('io', 'particle_mass'), ('io', 'particle_position_x'), ('io', 'particle_position_y'), ('io', 'particle_position_z')]
yt : [DEBUG ] 2025-08-04 11:21:25,448 Reading /home/XXX/Documents/prog/yt-data/output_00080/part_00080.out00012: [('io', 'particle_mass'), ('io', 'particle_position_x'), ('io', 'particle_position_y'), ('io', 'particle_position_z')]
yt : [DEBUG ] 2025-08-04 11:21:25,449 Reading /home/XXX/Documents/prog/yt-data/output_00080/part_00080.out00013: [('io', 'particle_mass'), ('io', 'particle_position_x'), ('io', 'particle_position_y'), ('io', 'particle_position_z')]
yt : [DEBUG ] 2025-08-04 11:21:25,451 Reading /home/XXX/Documents/prog/yt-data/output_00080/part_00080.out00014: [('io', 'particle_mass'), ('io', 'particle_position_x'), ('io', 'particle_position_y'), ('io', 'particle_position_z')]
yt : [DEBUG ] 2025-08-04 11:21:25,452 Reading /home/XXX/Documents/prog/yt-data/output_00080/part_00080.out00015: [('io', 'particle_mass'), ('io', 'particle_position_x'), ('io', 'particle_position_y'), ('io', 'particle_position_z')]
yt : [DEBUG ] 2025-08-04 11:21:25,453 Reading /home/XXX/Documents/prog/yt-data/output_00080/part_00080.out00016: [('io', 'particle_mass'), ('io', 'particle_position_x'), ('io', 'particle_position_y'), ('io', 'particle_position_z')]
yt : [DEBUG ] 2025-08-04 11:21:25,462 Reading /home/XXX/Documents/prog/yt-data/output_00080/part_00080.out00001: [('io', 'particle_position_x'), ('io', 'particle_position_y'), ('io', 'particle_position_z')]
yt : [DEBUG ] 2025-08-04 11:21:25,466 Reading /home/XXX/Documents/prog/yt-data/output_00080/part_00080.out00002: [('io', 'particle_position_x'), ('io', 'particle_position_y'), ('io', 'particle_position_z')]
yt : [DEBUG ] 2025-08-04 11:21:25,467 Reading /home/XXX/Documents/prog/yt-data/output_00080/part_00080.out00003: [('io', 'particle_position_x'), ('io', 'particle_position_y'), ('io', 'particle_position_z')]
yt : [DEBUG ] 2025-08-04 11:21:25,469 Reading /home/XXX/Documents/prog/yt-data/output_00080/part_00080.out00004: [('io', 'particle_position_x'), ('io', 'particle_position_y'), ('io', 'particle_position_z')]
yt : [DEBUG ] 2025-08-04 11:21:25,469 Reading /home/XXX/Documents/prog/yt-data/output_00080/part_00080.out00005: [('io', 'particle_position_x'), ('io', 'particle_position_y'), ('io', 'particle_position_z')]
yt : [DEBUG ] 2025-08-04 11:21:25,476 Reading /home/XXX/Documents/prog/yt-data/output_00080/part_00080.out00006: [('io', 'particle_position_x'), ('io', 'particle_position_y'), ('io', 'particle_position_z')]
yt : [DEBUG ] 2025-08-04 11:21:25,476 Reading /home/XXX/Documents/prog/yt-data/output_00080/part_00080.out00007: [('io', 'particle_position_x'), ('io', 'particle_position_y'), ('io', 'particle_position_z')]
yt : [DEBUG ] 2025-08-04 11:21:25,477 Reading /home/XXX/Documents/prog/yt-data/output_00080/part_00080.out00008: [('io', 'particle_position_x'), ('io', 'particle_position_y'), ('io', 'particle_position_z')]
yt : [DEBUG ] 2025-08-04 11:21:25,477 Reading /home/XXX/Documents/prog/yt-data/output_00080/part_00080.out00009: [('io', 'particle_position_x'), ('io', 'particle_position_y'), ('io', 'particle_position_z')]
yt : [DEBUG ] 2025-08-04 11:21:25,477 Reading /home/XXX/Documents/prog/yt-data/output_00080/part_00080.out00010: [('io', 'particle_position_x'), ('io', 'particle_position_y'), ('io', 'particle_position_z')]
yt : [DEBUG ] 2025-08-04 11:21:25,477 Reading /home/XXX/Documents/prog/yt-data/output_00080/part_00080.out00011: [('io', 'particle_position_x'), ('io', 'particle_position_y'), ('io', 'particle_position_z')]
yt : [DEBUG ] 2025-08-04 11:21:25,478 Reading /home/XXX/Documents/prog/yt-data/output_00080/part_00080.out00012: [('io', 'particle_position_x'), ('io', 'particle_position_y'), ('io', 'particle_position_z')]
yt : [DEBUG ] 2025-08-04 11:21:25,479 Reading /home/XXX/Documents/prog/yt-data/output_00080/part_00080.out00013: [('io', 'particle_position_x'), ('io', 'particle_position_y'), ('io', 'particle_position_z')]
yt : [DEBUG ] 2025-08-04 11:21:25,481 Reading /home/XXX/Documents/prog/yt-data/output_00080/part_00080.out00014: [('io', 'particle_position_x'), ('io', 'particle_position_y'), ('io', 'particle_position_z')]
yt : [DEBUG ] 2025-08-04 11:21:25,481 Reading /home/XXX/Documents/prog/yt-data/output_00080/part_00080.out00015: [('io', 'particle_position_x'), ('io', 'particle_position_y'), ('io', 'particle_position_z')]
yt : [DEBUG ] 2025-08-04 11:21:25,482 Reading /home/XXX/Documents/prog/yt-data/output_00080/part_00080.out00016: [('io', 'particle_position_x'), ('io', 'particle_position_y'), ('io', 'particle_position_z')]
import yt
@yt.particle_filter(filtered_type="io")
def DM_lores(pfilter, data):
return data[pfilter.filtered_type, "particle_mass"].to("code_mass").d > 3.1e-6
@yt.particle_filter(filtered_type="DM_lores")
def DM_lores_some_ids(pfilter, data):
return data[pfilter.filtered_type, "particle_identity"] < 10000
ds = yt.load_sample("output_00080")
ds.add_particle_filter("DM_lores")
ds.add_particle_filter("DM_lores_some_ids")
yt.set_log_level(10) # to see the IO footprint
ds.r["DM_lores_some_ids", "particle_position"]
The files will now be read 4 times!
yt : [DEBUG ] 2025-08-04 11:26:15,891 Reading /home/XXX/Documents/prog/yt-data/output_00080/part_00080.out00001: [('io', 'particle_mass'), ('io', 'particle_position_x'), ('io', 'particle_position_y'), ('io', 'particle_position_z')]
[...]
yt : [DEBUG ] 2025-08-04 11:26:16,065 Reading /home/XXX/Documents/prog/yt-data/output_00080/part_00080.out00001: [('io', 'particle_identity'), ('io', 'particle_position_x'), ('io', 'particle_position_y'), ('io', 'particle_position_z')]
[...]
yt : [DEBUG ] 2025-08-04 11:26:16,099 Reading /home/XXX/Documents/prog/yt-data/output_00080/part_00080.out00001: [('io', 'particle_mass'), ('io', 'particle_position_x'), ('io', 'particle_position_y'), ('io', 'particle_position_z')]
[...]
yt : [DEBUG ] 2025-08-04 11:26:16,123 Reading /home/XXX/Documents/prog/yt-data/output_00080/part_00080.out00001: [('io', 'particle_position_x'), ('io', 'particle_position_y'), ('io', 'particle_position_z')]
[...]
In addition to repetively reading the same part of the files (positions are read each time), this also isn't cache-friendly, since we read chunks of each files instead of the whole. Here, we read x y z m, then x y z id then x y z m again then x y z to finish. It would be much more optimal to read x y z id m and then do the filtering(s).
I wonder if we could improve this by making sure that a ParticleFilter records all of its base requirements rather than simply its direct requirements. So, at the time of adding, we recursively evaluate requirements and store the whole list. That might cause them to all be read in one go.
Someone should correct me, but it looks to me as if a particle filter's required fields are never considered directly. Instead, the work is being done indirectly in ParticleField.apply. It is reading fields as needed, but never taking stock of everything it's going to need and asking for all of them.
Ok, last contribution from me for today. The application of particle filters occurs in a completely separate place in yt/data_objects/selection_objects/data_selection_objects.YTSelectionContainer.get_data than field reading. More specifically, filtering happens before evaluation of dependencies for the requested field. I think that's where the work would have to happen.
I am going through the code and trying to figure out why it's done this way, and if it was the result of beating my head at a problem and giving up or if it was an oversight/failure on my part.
I think the problem may be related to the fact that we allow the fields to filter based on the fields we return, rather than explicitly not including those, but I am not entirely sure.
OK, I've convinced myself that requires is specifically designed to avoid overreading and this is indeed an error, not the result of giving up.
I'm not sure I understood your last message - are you saying that the reason we have multiple reads is because of a bug?