Regarding the Principal component analysis with CPPTRAJ
Hi Users/Developers,
I am having issues running PCA analysis with CPPTRAJ. It always stops showing the segmentation fault.
I am only considering the CA atoms of protein and P atoms of nucleic acid (a total of 28486 atoms). Therefore, my Covariance Matrix Size is 85,308 × 85,308 (3 × 28486 = 85,308), consuming a substantial memory of
Memory (bytes) = (85,308 × 85,308) ×size of a double-precision float (typically 8 bytes) ≈ 58GB.
This prompts me to do the analysis on ANDES HPC in Oak Ridge National Laboratory. However, the run still fails to produce results. The error message remains the same. So, are there any limitations on the number of atoms to be considered for the PCA analysis in CPPTRAJ? If so, how to change that (if possible)?
Hi, sorry for the delay here.
I haven't delved into the code yet but my suspicion is that there is an int somewhere being used in matrix indexing; max size of an int is 2147483647, which is much smaller than the size of your matrix. I'll try to get to this ASAP. Thanks for the report!
Sorry, for the delay on this; recent issues at my workplace have made things challenging.
I should have asked this from the beginning: what version of cpptraj are you using? If it's an older version, this issue may have been fixed. Using the latest version (6.29.10 available via GitHub) I am able to process a 28486 x 28486 atom matrix without issues:
CPPTRAJ: Trajectory Analysis. V6.29.10 (GitHub)
___ ___ ___ ___
| \/ | \/ | \/ |
_|_/\_|_/\_|_/\_|_
| Date/time: 02/26/25 09:08:23
| Available memory: 394.050 GB
INPUT: Reading input from 'largematrix.in'
[parm amber.parm7]
Reading 'amber.parm7' as Amber Topology
Radius Set: modified Bondi radii (mbondi)
[trajin final.1.nc 1 10]
Reading 'final.1.nc' as Amber NetCDF
[matrix name Large covar @1-28486 @28487-56972]
MATRIX: Calculating covariance matrix, output is by atom.
Matrix data set is 'Large'
Start: 1 Stop: Final frame
Mask1 is '@1-28486'
Mask2 is '@28487-56972'
[run]
---------- RUN BEGIN -------------------------------------------------
PARAMETER FILES (1 total):
0: amber.parm7, 856922 atoms, 261525 res, box: Truncated octahedron, 257753 mol, 256476 solvent
INPUT TRAJECTORIES (1 total):
0: 'final.1.nc' is a NetCDF (NetCDF3) AMBER trajectory with coordinates, time, box, Parm amber.parm7 (Truncated octahedron box) (reading 10 of 100)
Coordinate processing will occur on 10 frames.
BEGIN TRAJECTORY PROCESSING:
.....................................................
ACTION SETUP FOR PARM 'amber.parm7' (1 actions):
0: [matrix name Large covar @1-28486 @28487-56972]
Mask [@1-28486] corresponds to 28486 atoms.
Mask [@28487-56972] corresponds to 28486 atoms.
----- final.1.nc (1-10, 1) -----
0% 11% 22% 33% 44% 56% 67% 78% 89% 100% Complete.
Read 10 frames and processed 10 frames.
TIME: Avg. throughput= 0.1719 frames / second.
ACTION OUTPUT:
TIME: Analyses took 0.0000 seconds.
DATASETS (1 total):
Large "Large" (double matrix, matrix(covariance)), size is 7303069764 (58.425 GB)
Total data set memory usage is at least 58.425 GB
RUN TIMING:
TIME: Init : 0.0000 s ( 0.00%)
TIME: Trajectory Process : 58.1840 s ( 86.16%)
TIME: Action Post : 9.3472 s ( 13.84%)
TIME: Analysis : 0.0000 s ( 0.00%)
TIME: Data File Write : 0.0000 s ( 0.00%)
TIME: Other : 0.0003 s ( 0.00%)
TIME: Run Total 67.5315 s
---------- RUN END ---------------------------------------------------
TIME: Total execution time: 68.3428 seconds.
--------------------------------------------------------------------------------
To cite CPPTRAJ use:
Daniel R. Roe and Thomas E. Cheatham, III, "PTRAJ and CPPTRAJ: Software for
Processing and Analysis of Molecular Dynamics Trajectory Data". J. Chem.
Theory Comput., 2013, 9 (7), pp 3084-3095.
Hi Dr. Roe,
I was using the bundled cpptraj with the Amber22 package. May be that was the reason behind failure. I will give it a try with the recent version of cpptraj and let you know the outcome.
Just wanted to know, if the issue got resolved in the Amber24 bundled cpptraj?
Yours sincerely, Satyajit Khatua
From: Daniel R. Roe @.> Sent: Wednesday, February 26, 2025 9:13:05 am To: Amber-MD/cpptraj @.> Cc: satyajitkhatua09 @.>; Author @.> Subject: Re: [Amber-MD/cpptraj] Regarding the Principal component analysis with CPPTRAJ (Issue #1122)
[drroe]drroe left a comment (Amber-MD/cpptraj#1122)https://github.com/Amber-MD/cpptraj/issues/1122#issuecomment-2685146670
Sorry, for the delay on this; recent issues at my workplace have made things challenging.
I should have asked this from the beginning: what version of cpptraj are you using? If it's an older version, this issue may have been fixed. Using the latest version (6.29.10 available via GitHub) I am able to process a 28486 x 28486 atom matrix without issues:
CPPTRAJ: Trajectory Analysis. V6.29.10 (GitHub) ___ ___ ___ ___ | / | / | / | |/_|/_|/_|_
| Date/time: 02/26/25 09:08:23 | Available memory: 394.050 GB
INPUT: Reading input from 'largematrix.in' [parm amber.parm7] Reading 'amber.parm7' as Amber Topology Radius Set: modified Bondi radii (mbondi) [trajin final.1.nc 1 10] Reading 'final.1.nc' as Amber NetCDF [matrix name Large covar @1-28486 @28487-56972] MATRIX: Calculating covariance matrix, output is by atom. Matrix data set is 'Large' Start: 1 Stop: Final frame Mask1 is @.' Mask2 is @.' [run] ---------- RUN BEGIN -------------------------------------------------
PARAMETER FILES (1 total): 0: amber.parm7, 856922 atoms, 261525 res, box: Truncated octahedron, 257753 mol, 256476 solvent
INPUT TRAJECTORIES (1 total): 0: 'final.1.nc' is a NetCDF (NetCDF3) AMBER trajectory with coordinates, time, box, Parm amber.parm7 (Truncated octahedron box) (reading 10 of 100) Coordinate processing will occur on 10 frames.
BEGIN TRAJECTORY PROCESSING: ..................................................... ACTION SETUP FOR PARM 'amber.parm7' (1 actions): 0: [matrix name Large covar @1-28486 @28487-56972] Mask @.*** corresponds to 28486 atoms. Mask @.*** corresponds to 28486 atoms. ----- final.1.nc (1-10, 1) ----- 0% 11% 22% 33% 44% 56% 67% 78% 89% 100% Complete.
Read 10 frames and processed 10 frames. TIME: Avg. throughput= 0.1719 frames / second.
ACTION OUTPUT: TIME: Analyses took 0.0000 seconds.
DATASETS (1 total): Large "Large" (double matrix, matrix(covariance)), size is 7303069764 (58.425 GB) Total data set memory usage is at least 58.425 GB
RUN TIMING: TIME: Init : 0.0000 s ( 0.00%) TIME: Trajectory Process : 58.1840 s ( 86.16%) TIME: Action Post : 9.3472 s ( 13.84%) TIME: Analysis : 0.0000 s ( 0.00%) TIME: Data File Write : 0.0000 s ( 0.00%) TIME: Other : 0.0003 s ( 0.00%) TIME: Run Total 67.5315 s ---------- RUN END --------------------------------------------------- TIME: Total execution time: 68.3428 seconds.
To cite CPPTRAJ use: Daniel R. Roe and Thomas E. Cheatham, III, "PTRAJ and CPPTRAJ: Software for Processing and Analysis of Molecular Dynamics Trajectory Data". J. Chem. Theory Comput., 2013, 9 (7), pp 3084-3095.
— Reply to this email directly, view it on GitHubhttps://github.com/Amber-MD/cpptraj/issues/1122#issuecomment-2685146670, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AYWUQ2QPD2KX5Y27VPSVVJL2RXDXDAVCNFSM6AAAAABUAZ2UGOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDMOBVGE2DMNRXGA. You are receiving this because you authored the thread.
I tried V6.24.0 which is bundled with Amber24 and it also worked, so maybe try that. I still recommend using the GitHub version if it's convenient for you though as that version is updated more frequently.