Investigate Memory Usage of Scoring
Proteomics Dataset
16 runs, ~32K precursors (target + decoy), ~196K transitions, ~1.4M precursor features (peak-groups)
Command
/usr/bin/time pyprophet score --in merged_osw.parquet --level ms1ms2 --classifier SVM --xeval_num_iter 3 --ss_num_iter 3 --threads 3 --profile
Peak RAM usage is ~17.34 GB
1902.14user 1397.48system 23:12.49elapsed 236%CPU (0avgtext+0avgdata 18182704maxresident)k
320392inputs+1639776outputs (285major+10407902minor)pagefaults 0swaps
Note: The total memory allocated reported by memray is virtual memory allocated (i.e. by pandas, numpy, duckdb), not the actual materialized physical memory used.
$ memray stats memray_pyp_score.bin
📏 Total allocations:
4923580
📦 Total memory allocated:
13.453GB
📊 Histogram of allocation size:
min: 1.000B
----------------------------------------------
< 7.000B : 79403 ▇
< 49.000B : 2911664 ▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇
< 345.000B : 1472040 ▇▇▇▇▇▇▇▇▇▇▇▇▇
< 2.370KB : 391343 ▇▇▇▇
< 16.643KB : 55240 ▇
< 116.825KB: 6330 ▇
< 820.058KB: 6666 ▇
< 5.621MB : 443 ▇
< 39.460MB : 396 ▇
<=276.990MB: 55 ▇
----------------------------------------------
max: 276.990MB
📂 Allocator type distribution:
MALLOC: 4916019
MMAP: 6732
REALLOC: 702
CALLOC: 127
🥇 Top 15 largest allocating locations (by size):
- <stack trace unavailable> -> 6.132GB <- This is mostly duckdb
- __array__:/home/singjc/anaconda3/envs/pyprophet/lib/python3.9/site-packages/pandas/core/series.py:1031 -> 1.719GB
- copy:/home/singjc/anaconda3/envs/pyprophet/lib/python3.9/site-packages/pandas/core/internals/blocks.py:796 -> 1.017GB
- _fetch_ms2_features:/home/singjc/Documents/github/pyprophet/pyprophet/io/scoring/parquet.py:130 -> 978.292MB
- _take_nd_ndarray:/home/singjc/anaconda3/envs/pyprophet/lib/python3.9/site-packages/pandas/core/array_algos/take.py:157 -> 790.236MB
- _merge_ms1ms2_features:/home/singjc/Documents/github/pyprophet/pyprophet/io/scoring/parquet.py:218 -> 401.331MB
- _merge_blocks:/home/singjc/anaconda3/envs/pyprophet/lib/python3.9/site-packages/pandas/core/internals/managers.py:2301 -> 331.308MB
- vstack:/home/singjc/anaconda3/envs/pyprophet/lib/python3.9/site-packages/numpy/_core/shape_base.py:287 -> 331.302MB
- _stack_arrays:/home/singjc/anaconda3/envs/pyprophet/lib/python3.9/site-packages/pandas/core/internals/managers.py:2252 -> 316.366MB
- maybe_convert_platform:/home/singjc/anaconda3/envs/pyprophet/lib/python3.9/site-packages/pandas/core/dtypes/cast.py:138 -> 222.684MB
- collect:/home/singjc/anaconda3/envs/pyprophet/lib/python3.9/site-packages/polars/lazyframe/frame.py:2207 -> 135.000MB
- get_join_indexers_non_unique:/home/singjc/anaconda3/envs/pyprophet/lib/python3.9/site-packages/pandas/core/reshape/merge.py:1795 -> 130.348MB
- maybe_infer_to_datetimelike:/home/singjc/anaconda3/envs/pyprophet/lib/python3.9/site-packages/pandas/core/dtypes/cast.py:1189 -> 111.343MB
- _isna_array:/home/singjc/anaconda3/envs/pyprophet/lib/python3.9/site-packages/pandas/core/dtypes/missing.py:300 -> 107.266MB
- <listcomp>:/home/singjc/Documents/github/pyprophet/pyprophet/scoring/data_handling.py:239 -> 103.221MB
🥇 Top 15 largest allocating locations (by number of allocations):
- <stack trace unavailable> -> 3708032
- _fetch_ms2_features:/home/singjc/Documents/github/pyprophet/pyprophet/io/scoring/parquet.py:130 -> 897673
- __init__:/home/singjc/anaconda3/envs/pyprophet/lib/python3.9/site-packages/pyarrow/parquet/core.py:317 -> 97523
- _merge_ms1ms2_features:/home/singjc/Documents/github/pyprophet/pyprophet/io/scoring/parquet.py:218 -> 89539
- read:/home/singjc/Documents/github/pyprophet/pyprophet/io/scoring/parquet.py:46 -> 64332
- _init_duckdb_views:/home/singjc/Documents/github/pyprophet/pyprophet/io/_base.py:982 -> 46096
- open_binary:/home/singjc/anaconda3/envs/pyprophet/lib/python3.9/site-packages/psutil/_common.py:711 -> 7398
- __array__:/home/singjc/anaconda3/envs/pyprophet/lib/python3.9/site-packages/pandas/core/series.py:1031 -> 5711
- read_schema:/home/singjc/anaconda3/envs/pyprophet/lib/python3.9/site-packages/pyarrow/parquet/core.py:2348 -> 2208
- _build_nested_paths:/home/singjc/anaconda3/envs/pyprophet/lib/python3.9/site-packages/pyarrow/parquet/core.py:337 -> 1797
- _to_pandas_without_object_columns:/home/singjc/anaconda3/envs/pyprophet/lib/python3.9/site-packages/polars/dataframe/frame.py:2483 -> 613
- table_to_dataframe:/home/singjc/anaconda3/envs/pyprophet/lib/python3.9/site-packages/pyarrow/pandas_compat.py:808 -> 285
- _subst_vars:/home/singjc/anaconda3/envs/pyprophet/lib/python3.9/sysconfig.py:156 -> 180
- _extend_dict:/home/singjc/anaconda3/envs/pyprophet/lib/python3.9/sysconfig.py:168 -> 168
- _to_pandas_without_object_columns:/home/singjc/anaconda3/envs/pyprophet/lib/python3.9/site-packages/polars/dataframe/frame.py:2484 -> 145
Phosphoproteomics Dataset
20 runs, ~45K precursors (target + decoys), ~5.7M transitions, ~1.8M precursor features (peak-groups)
Command
/usr/bin/time pyprophet score --in merged.oswpq --level ms1ms2 --ss_num_iter 3 --xeval_num_iter 3 --profile
Peak RAM usage is ~9.67 GB
1271.60user 615.97system 18:56.20elapsed 166%CPU (0avgtext+0avgdata 10141896maxresident)k
168inputs+1204336outputs (95major+6227858minor)pagefaults 0swaps
Note: The total memory allocated reported by memray is virtual memory allocated (i.e. by pandas, numpy, duckdb), not the actual materialized physical memory used.
$ memray stats memray_score.bin
📏 Total allocations:
8573096
📦 Total memory allocated:
179.028GB
📊 Histogram of allocation size:
min: 1.000B
----------------------------------------------
< 7.000B : 195538 ▇
< 60.000B : 5567520 ▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇
< 473.000B : 1390693 ▇▇▇▇▇▇▇
< 3.604KB : 627854 ▇▇▇
< 28.088KB : 384064 ▇▇
< 218.924KB: 303930 ▇▇
< 1.666MB : 79467 ▇
< 12.988MB : 23061 ▇
< 101.226MB: 882 ▇
<=788.964MB: 87 ▇
----------------------------------------------
max: 788.964MB
📂 Allocator type distribution:
MALLOC: 8375834
REALLOC: 136973
CALLOC: 42899
MMAP: 17390
🥇 Top 15 largest allocating locations (by size):
- _take_nd_ndarray:/home/singjc/anaconda3/envs/pyprophet/lib/python3.9/site-packages/pandas/core/array_algos/take.py:157 -> 88.081GB
- <stack trace unavailable> -> 23.864GB
- <listcomp>:/home/singjc/Documents/github/pyprophet/pyprophet/report.py:173 -> 13.667GB
- copy:/home/singjc/anaconda3/envs/pyprophet/lib/python3.9/site-packages/pandas/core/internals/blocks.py:796 -> 8.521GB
- __array__:/home/singjc/anaconda3/envs/pyprophet/lib/python3.9/site-packages/pandas/core/series.py:1031 -> 4.738GB
- unique_with_mask:/home/singjc/anaconda3/envs/pyprophet/lib/python3.9/site-packages/pandas/core/algorithms.py:438 -> 4.728GB
- plot_identification_consistency:/home/singjc/Documents/github/pyprophet/pyprophet/report.py:176 -> 4.695GB
- unique_with_mask:/home/singjc/anaconda3/envs/pyprophet/lib/python3.9/site-packages/pandas/core/algorithms.py:440 -> 3.522GB
- vstack:/home/singjc/anaconda3/envs/pyprophet/lib/python3.9/site-packages/numpy/_core/shape_base.py:287 -> 3.334GB
- _merge_blocks:/home/singjc/anaconda3/envs/pyprophet/lib/python3.9/site-packages/pandas/core/internals/managers.py:2301 -> 2.884GB
- _evaluate_standard:/home/singjc/anaconda3/envs/pyprophet/lib/python3.9/site-packages/pandas/core/computation/expressions.py:73 -> 2.780GB
- _fetch_ms2_features:/home/singjc/Documents/github/pyprophet/pyprophet/io/scoring/split_parquet.py:124 -> 1.554GB
- _stack_arrays:/home/singjc/anaconda3/envs/pyprophet/lib/python3.9/site-packages/pandas/core/internals/managers.py:2252 -> 1.499GB
- take:/home/singjc/anaconda3/envs/pyprophet/lib/python3.9/site-packages/pandas/core/algorithms.py:1239 -> 1.236GB
- _getitem_bool_array:/home/singjc/anaconda3/envs/pyprophet/lib/python3.9/site-packages/pandas/core/frame.py:4154 -> 1.234GB
🥇 Top 15 largest allocating locations (by number of allocations):
- <stack trace unavailable> -> 5635464
- _fetch_ms2_features:/home/singjc/Documents/github/pyprophet/pyprophet/io/scoring/split_parquet.py:124 -> 948497
- _init_duckdb_views:/home/singjc/Documents/github/pyprophet/pyprophet/io/_base.py:1246 -> 216931
- _take_nd_ndarray:/home/singjc/anaconda3/envs/pyprophet/lib/python3.9/site-packages/pandas/core/array_algos/take.py:157 -> 147627
- __init__:/home/singjc/anaconda3/envs/pyprophet/lib/python3.9/site-packages/pyarrow/parquet/core.py:317 -> 119820
- unique_with_mask:/home/singjc/anaconda3/envs/pyprophet/lib/python3.9/site-packages/pandas/core/algorithms.py:440 -> 107224
- <listcomp>:/home/singjc/Documents/github/pyprophet/pyprophet/report.py:173 -> 105035
- _any:/home/singjc/anaconda3/envs/pyprophet/lib/python3.9/site-packages/numpy/_core/_methods.py:64 -> 86112
- unique_with_mask:/home/singjc/anaconda3/envs/pyprophet/lib/python3.9/site-packages/pandas/core/algorithms.py:438 -> 84076
- plot_identification_consistency:/home/singjc/Documents/github/pyprophet/pyprophet/report.py:176 -> 79352
- read:/home/singjc/Documents/github/pyprophet/pyprophet/io/scoring/split_parquet.py:49 -> 64332
- _write_parquet_with_scores:/home/singjc/Documents/github/pyprophet/pyprophet/io/scoring/split_parquet.py:345 -> 64260
- maybe_convert_indices:/home/singjc/anaconda3/envs/pyprophet/lib/python3.9/site-packages/pandas/core/indexers/utils.py:280 -> 63383
- transform_affine:/home/singjc/anaconda3/envs/pyprophet/lib/python3.9/site-packages/matplotlib/transforms.py:1865 -> 55828
- _write_parquet_with_scores:/home/singjc/Documents/github/pyprophet/pyprophet/io/scoring/split_parquet.py:351 -> 51864
Which branch is this on? It does not seem that I am getting a --profile option
Which branch is this on? It does not seem that I am getting a --profile option
Should be available in the master branch: https://github.com/PyProphet/pyprophet/blob/master/pyprophet%2Fcli%2Fscore.py#L241
Thanks! I figured it out, I was on the wrong branch.