cpptraj icon indicating copy to clipboard operation
cpptraj copied to clipboard

Regarding the Principal component analysis with CPPTRAJ

Open satyajitkhatua09 opened this issue 1 year ago • 4 comments

Hi Users/Developers,

I am having issues running PCA analysis with CPPTRAJ. It always stops showing the segmentation fault.

I am only considering the CA atoms of protein and P atoms of nucleic acid (a total of 28486 atoms). Therefore, my Covariance Matrix Size is 85,308 × 85,308 (3 × 28486 = 85,308), consuming a substantial memory of

Memory (bytes) = (85,308 × 85,308) ×size of a double-precision float (typically 8 bytes) ≈ 58GB.

This prompts me to do the analysis on ANDES HPC in Oak Ridge National Laboratory. However, the run still fails to produce results. The error message remains the same. So, are there any limitations on the number of atoms to be considered for the PCA analysis in CPPTRAJ? If so, how to change that (if possible)?

satyajitkhatua09 avatar Dec 21 '24 17:12 satyajitkhatua09

Hi, sorry for the delay here.

I haven't delved into the code yet but my suspicion is that there is an int somewhere being used in matrix indexing; max size of an int is 2147483647, which is much smaller than the size of your matrix. I'll try to get to this ASAP. Thanks for the report!

drroe avatar Jan 14 '25 18:01 drroe

Sorry, for the delay on this; recent issues at my workplace have made things challenging.

I should have asked this from the beginning: what version of cpptraj are you using? If it's an older version, this issue may have been fixed. Using the latest version (6.29.10 available via GitHub) I am able to process a 28486 x 28486 atom matrix without issues:

CPPTRAJ: Trajectory Analysis. V6.29.10 (GitHub)
    ___  ___  ___  ___
     | \/ | \/ | \/ | 
    _|_/\_|_/\_|_/\_|_

| Date/time: 02/26/25 09:08:23
| Available memory: 394.050 GB

INPUT: Reading input from 'largematrix.in'
  [parm amber.parm7]
	Reading 'amber.parm7' as Amber Topology
	Radius Set: modified Bondi radii (mbondi)
  [trajin final.1.nc 1 10]
	Reading 'final.1.nc' as Amber NetCDF
  [matrix name Large covar @1-28486 @28487-56972]
    MATRIX: Calculating covariance matrix, output is by atom.
	Matrix data set is 'Large'
	Start: 1  Stop: Final frame
	Mask1 is '@1-28486'
	Mask2 is '@28487-56972'
  [run]
---------- RUN BEGIN -------------------------------------------------

PARAMETER FILES (1 total):
 0: amber.parm7, 856922 atoms, 261525 res, box: Truncated octahedron, 257753 mol, 256476 solvent

INPUT TRAJECTORIES (1 total):
 0: 'final.1.nc' is a NetCDF (NetCDF3) AMBER trajectory with coordinates, time, box, Parm amber.parm7 (Truncated octahedron box) (reading 10 of 100)
  Coordinate processing will occur on 10 frames.

BEGIN TRAJECTORY PROCESSING:
.....................................................
ACTION SETUP FOR PARM 'amber.parm7' (1 actions):
  0: [matrix name Large covar @1-28486 @28487-56972]
	Mask [@1-28486] corresponds to 28486 atoms.
	Mask [@28487-56972] corresponds to 28486 atoms.
----- final.1.nc (1-10, 1) -----
 0% 11% 22% 33% 44% 56% 67% 78% 89% 100% Complete.

Read 10 frames and processed 10 frames.
TIME: Avg. throughput= 0.1719 frames / second.

ACTION OUTPUT:
TIME: Analyses took 0.0000 seconds.

DATASETS (1 total):
	Large "Large" (double matrix, matrix(covariance)), size is 7303069764 (58.425 GB)
    Total data set memory usage is at least 58.425 GB

RUN TIMING:
TIME:		Init               : 0.0000 s (  0.00%)
TIME:		Trajectory Process : 58.1840 s ( 86.16%)
TIME:		Action Post        : 9.3472 s ( 13.84%)
TIME:		Analysis           : 0.0000 s (  0.00%)
TIME:		Data File Write    : 0.0000 s (  0.00%)
TIME:		Other              : 0.0003 s (  0.00%)
TIME:	Run Total 67.5315 s
---------- RUN END ---------------------------------------------------
TIME: Total execution time: 68.3428 seconds.
--------------------------------------------------------------------------------
To cite CPPTRAJ use:
Daniel R. Roe and Thomas E. Cheatham, III, "PTRAJ and CPPTRAJ: Software for
  Processing and Analysis of Molecular Dynamics Trajectory Data". J. Chem.
  Theory Comput., 2013, 9 (7), pp 3084-3095.

drroe avatar Feb 26 '25 14:02 drroe

Hi Dr. Roe,

I was using the bundled cpptraj with the Amber22 package. May be that was the reason behind failure. I will give it a try with the recent version of cpptraj and let you know the outcome.

Just wanted to know, if the issue got resolved in the Amber24 bundled cpptraj?

Yours sincerely, Satyajit Khatua


From: Daniel R. Roe @.> Sent: Wednesday, February 26, 2025 9:13:05 am To: Amber-MD/cpptraj @.> Cc: satyajitkhatua09 @.>; Author @.> Subject: Re: [Amber-MD/cpptraj] Regarding the Principal component analysis with CPPTRAJ (Issue #1122)

[drroe]drroe left a comment (Amber-MD/cpptraj#1122)https://github.com/Amber-MD/cpptraj/issues/1122#issuecomment-2685146670

Sorry, for the delay on this; recent issues at my workplace have made things challenging.

I should have asked this from the beginning: what version of cpptraj are you using? If it's an older version, this issue may have been fixed. Using the latest version (6.29.10 available via GitHub) I am able to process a 28486 x 28486 atom matrix without issues:

CPPTRAJ: Trajectory Analysis. V6.29.10 (GitHub) ___ ___ ___ ___ | / | / | / | |/_|/_|/_|_

| Date/time: 02/26/25 09:08:23 | Available memory: 394.050 GB

INPUT: Reading input from 'largematrix.in' [parm amber.parm7] Reading 'amber.parm7' as Amber Topology Radius Set: modified Bondi radii (mbondi) [trajin final.1.nc 1 10] Reading 'final.1.nc' as Amber NetCDF [matrix name Large covar @1-28486 @28487-56972] MATRIX: Calculating covariance matrix, output is by atom. Matrix data set is 'Large' Start: 1 Stop: Final frame Mask1 is @.' Mask2 is @.' [run] ---------- RUN BEGIN -------------------------------------------------

PARAMETER FILES (1 total): 0: amber.parm7, 856922 atoms, 261525 res, box: Truncated octahedron, 257753 mol, 256476 solvent

INPUT TRAJECTORIES (1 total): 0: 'final.1.nc' is a NetCDF (NetCDF3) AMBER trajectory with coordinates, time, box, Parm amber.parm7 (Truncated octahedron box) (reading 10 of 100) Coordinate processing will occur on 10 frames.

BEGIN TRAJECTORY PROCESSING: ..................................................... ACTION SETUP FOR PARM 'amber.parm7' (1 actions): 0: [matrix name Large covar @1-28486 @28487-56972] Mask @.*** corresponds to 28486 atoms. Mask @.*** corresponds to 28486 atoms. ----- final.1.nc (1-10, 1) ----- 0% 11% 22% 33% 44% 56% 67% 78% 89% 100% Complete.

Read 10 frames and processed 10 frames. TIME: Avg. throughput= 0.1719 frames / second.

ACTION OUTPUT: TIME: Analyses took 0.0000 seconds.

DATASETS (1 total): Large "Large" (double matrix, matrix(covariance)), size is 7303069764 (58.425 GB) Total data set memory usage is at least 58.425 GB

RUN TIMING: TIME: Init : 0.0000 s ( 0.00%) TIME: Trajectory Process : 58.1840 s ( 86.16%) TIME: Action Post : 9.3472 s ( 13.84%) TIME: Analysis : 0.0000 s ( 0.00%) TIME: Data File Write : 0.0000 s ( 0.00%) TIME: Other : 0.0003 s ( 0.00%) TIME: Run Total 67.5315 s ---------- RUN END --------------------------------------------------- TIME: Total execution time: 68.3428 seconds.

To cite CPPTRAJ use: Daniel R. Roe and Thomas E. Cheatham, III, "PTRAJ and CPPTRAJ: Software for Processing and Analysis of Molecular Dynamics Trajectory Data". J. Chem. Theory Comput., 2013, 9 (7), pp 3084-3095.

— Reply to this email directly, view it on GitHubhttps://github.com/Amber-MD/cpptraj/issues/1122#issuecomment-2685146670, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AYWUQ2QPD2KX5Y27VPSVVJL2RXDXDAVCNFSM6AAAAABUAZ2UGOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDMOBVGE2DMNRXGA. You are receiving this because you authored the thread.

satyajitkhatua09 avatar Feb 26 '25 16:02 satyajitkhatua09

I tried V6.24.0 which is bundled with Amber24 and it also worked, so maybe try that. I still recommend using the GitHub version if it's convenient for you though as that version is updated more frequently.

drroe avatar Feb 26 '25 17:02 drroe