mdsplus icon indicating copy to clipboard operation
mdsplus copied to clipboard

MDSplus .data file gets very large

Open margomw opened this issue 1 year ago • 38 comments

Affiliation General Atomics

Version(s) Affected 7.132-0.el7

Platform CentOS Linux 7

Describe the bug A development branch of PCS code outputs data to PTDATA (GA in house database) and MDSplus. The data is assumably identical. We ran 14 shots and the average PTDATA file size is 142 MB while MDSplus size is 928 MB

To Reproduce Steps to reproduce the behavior:

  1. Run a development branch of PCS on Saga or Iris computer (at GA)
  2. Cycle PCS
  3. Read the data in /fusion/d3d/d3share/d3data/d3dpcs_965507.datafile

Expected behavior We expect that the file size of MDSplus is on par with PTDATA especially since compression is automatically turned on for MDS

Screenshots If applicable, add screenshots to help explain your problem.

Additional context Data can also be seen from Omega cluster

margomw avatar Dec 02 '23 00:12 margomw

During a conversation with @kgerickson and @margomw, these additional details were provided:

  • When an array of 10K elements (int32, float32 or float64) is written to a simple tree, the file size is comparable to what is seen when the same data is written to GA's other data system, PTDATA.
  • The 8x to 10x data explosion only occurs when writing to the MDSplus shot tree (for the DIII-D plasma control system) that is created with the "create pulse" command.
  • The PCS model tree is created dynamically based on a configuration file.
  • The PCS model tree has no data in the "*.datafile" -- thus the data expansion is occurring on write to the shot tree.
  • The PCS tree does not use segmented signals.
  • All signals are created with TCL's build_signal(<data>, , <ref_to_timebase_node>) construct. No raw data. And because this construct is referencing a timebase node, presumably is only storing the node reference -- thus not instantiating the timebase multiple times and thus increasing the file size.

mwinkel-dev avatar Dec 02 '23 01:12 mwinkel-dev

Did an ls -lh of the suggested shot and it is only ~213 MB in size. Which is ~71 MB bigger (i.e., 1.5x bigger) than the PTDATA average of ~142 MB, but not at all close to the ~928 MB average size reported in the initial bug report. In the following ls output, note the M and K suffixes on the file sizes.

$ ls -lh /fusion/d3d/d3share/d3data/d3dpcs_965507.*
-rw-r--r-- 1 8848 pcsgrp 1.1M Nov 13 12:20 /fusion/d3d/d3share/d3data/d3dpcs_965507.characteristics
-rw-r--r-- 1 8848 pcsgrp 211M Nov 13 12:20 /fusion/d3d/d3share/d3data/d3dpcs_965507.datafile
-rw-r--r-- 1 8848 pcsgrp 935K Nov 13 12:20 /fusion/d3d/d3share/d3data/d3dpcs_965507.tree

Is there another shot that is more representative of the ~8x to ~10x expansion of the data (compared to PTDATA)? If so, please suggest some other shots for me to examine.

Thanks, -MW

mwinkel-dev avatar Dec 05 '23 23:12 mwinkel-dev

Mark,

The files can be downloaded from Omega, path /cscratch/kellera/

Here is a listing @.*** kellera]$ ls -lh total 3.3G -rw-rw-r-- 1 kellera kellera 860M Dec 6 09:11 d3dpcs_965504.datafile -rw-rw-r-- 1 kellera kellera 857M Dec 6 09:11 d3dpcs_965505.datafile -rw-rw-r-- 1 kellera kellera 950M Dec 6 09:11 d3dpcs_965506.datafile -rw-rw-r-- 1 kellera kellera 963M Dec 6 09:11 d3dpcs_965508.datafile -rw-rw-r-- 1 kellera kellera 944M Dec 6 09:11 d3dpcs_965509.datafile -rw-rw-r-- 1 kellera kellera 888M Dec 6 09:11 d3dpcs_965510.datafile -rw-rw-r-- 1 kellera kellera 880M Dec 6 09:11 d3dpcs_965511.datafile -rw-rw-r-- 1 kellera kellera 864M Dec 6 09:11 d3dpcs_965512.datafile -rw-rw-r-- 1 kellera kellera 910M Dec 6 09:11 d3dpcs_965513.datafile -rw-rw-r-- 1 kellera kellera 924M Dec 6 09:11 d3dpcs_965514.datafile -rw-rw-r-- 1 kellera kellera 914M Dec 6 09:11 d3dpcs_965515.datafile -rw-rw-r-- 1 kellera kellera 1019M Dec 6 09:11 d3dpcs_965516.datafile -rw-rw-r-- 1 kellera kellera 919M Dec 6 09:11 d3dpcs_965517.datafile -rw-rw-r-- 1 kellera kellera 947M Dec 6 09:11 d3dpcs_965518.datafile -rw-rw-r-- 1 kellera kellera 0 Dec 6 09:11 d3dpcs_model.datafile

----- Original Message ----- From: "Martin Margo" @.> To: "MDSplus/mdsplus" @.> Cc: "Martin Margo" @.>, "Your activity" @.> Sent: Friday, December 1, 2023 4:05:34 PM Subject: [MDSplus/mdsplus] MDSplus .data file gets very large (Issue #2658)

Affiliation General Atomics

Version(s) Affected 7.132-0.el7

Platform CentOS Linux 7

Describe the bug A development branch of PCS code outputs data to PTDATA (GA in house database) and MDSplus. The data is assumably identical. We ran 14 shots and the average PTDATA file size is 142 MB while MDSplus size is 928 MB

To Reproduce Steps to reproduce the behavior:

  1. Run a development branch of PCS on Saga or Iris computer (at GA)
  2. Cycle PCS
  3. Read the data in /fusion/d3d/d3share/d3data/d3dpcs_965507.datafile

Expected behavior We expect that the file size of MDSplus is on par with PTDATA especially since compression is automatically turned on for MDS

Screenshots If applicable, add screenshots to help explain your problem.

Additional context Data can also be seen from Omega cluster

-- Reply to this email directly or view it on GitHub: https://github.com/MDSplus/mdsplus/issues/2658 You are receiving this because you are subscribed to this thread.

Message ID: @.***>

margomw avatar Dec 06 '23 17:12 margomw

Hi @margomw -- Thanks for the list of files. I will work with two shots: 965516 (the largest) and the model. I have copied both of those *.datafile to my home directory.

However, I also need the *.tree and *.characterstics files for the 965516 shot and the model. Let me know when the *.tree and *.characteristics are also available in /cscratch/kellera.

Thanks, -MW

mwinkel-dev avatar Dec 06 '23 22:12 mwinkel-dev

@mwinkel-dev : *.tree and *.characteristics are installed in /cscratch/kellera on Omega. Please inspect and tell us if you find anything interesting. Thank you

margomw avatar Dec 07 '23 17:12 margomw

Hi @margomw and @kgerickson -- Thanks for the *.tree and *.characteristics files.

I did a cursory inspection and it appears that some nodes in the tree are not being compressed. Compressing the shot reduced its datafile by a factor of ~5.

Here are the steps . . .

  • I took the d3dpcs_965516 shot and copied it to xmw_516
  • Then ran mdstcl
  • And in that utility, ran the compress xmw /shot=516 command
  • quit mdstcl
  • And ran a ls -lh *516.datafile which produced the following output
$ ls -lh *516.datafile
-rw-rw-r-- 1 winkelm winkelm 1019M Dec  6 14:04 d3dpcs_965516.datafile
-rw-rw-r-- 1 winkelm winkelm  202M Dec  7 10:39 xmw_516.datafile

This is a surprising amount of compression. Later today, I will investigate to see which nodes in the tree were ~5x bigger than expected.

mwinkel-dev avatar Dec 07 '23 18:12 mwinkel-dev

A spot check of the d3dpcs_965516 shot shows that some signals have the compress_on_put attribute, but many signals do not (i.e., they are just flagged as compressible). The timebases might have the same issue.

Because the model is generated dynamically, I suggest examining the program that creates it. Ensure that the program is adding the compress_on_put attribute to all signals and arrays.

After making that change, let me know if it fixed the data explosion problem for the PCS data.

mwinkel-dev avatar Dec 07 '23 19:12 mwinkel-dev

Thanks for that! I know where to look for that fix.

On Fri, Dec 8, 2023 at 04:26 mwinkel-dev @.***> wrote:

A spot check of the d3dpcs_965516 shot shows that some signals have the compress_on_put attribute, but many signals do not (i.e., they are just flagged as compressible). The timebases might have the same issue.

Because the model is generated dynamically, I suggest examining the program that creates it. Ensure that the program is adding the compress_on_put attribute to all signals and arrays.

After making that change, let me know if it fixed the data explosion problem for the PCS data.

— Reply to this email directly, view it on GitHub https://github.com/MDSplus/mdsplus/issues/2658#issuecomment-1845973036, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADL6D6TDP5D6OWGK4TTUZQLYIIJ7TAVCNFSM6AAAAABADRZ4G2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNBVHE3TGMBTGY . You are receiving this because you were mentioned.Message ID: @.***>

kgerickson avatar Dec 07 '23 20:12 kgerickson

Hi @kgerickson and @margomw -- Did adding compress_on_put eliminate the problem with the huge datafiles? If so, let me know and I'll then close this issue.

mwinkel-dev avatar Dec 13 '23 19:12 mwinkel-dev

Hello

I have patched the MDSTCL script to set the compress_on_put. I am waiting for the other engineer to independently try the fix.

Martin

----- Original Message ----- From: "mwinkel-dev" @.> To: "MDSplus" @.> Cc: "Martin Margo" @.>, "Mention" @.> Sent: Wednesday, December 13, 2023 11:11:12 AM Subject: Re: [MDSplus/mdsplus] MDSplus .data file gets very large (Issue #2658)

Hi @kgerickson and @margomw -- Did adding compress_on_put eliminate the problem with the huge datafiles? If so, let me now and I'll then close this issue.

-- Reply to this email directly or view it on GitHub: https://github.com/MDSplus/mdsplus/issues/2658#issuecomment-1854556425 You are receiving this because you were mentioned.

Message ID: @.***>

margomw avatar Dec 13 '23 19:12 margomw

Hi Martin,

Thanks for the update.

-Mark

mwinkel-dev avatar Dec 13 '23 21:12 mwinkel-dev

Mark

Please inspect the newly patched MDSplus trees. Size still look inflated. Files are in /cscratch/kellera/, named d3dpcs_965525.{datafile|tree|characteristics}

The .datafile is 947 MB.

margomw avatar Dec 13 '23 23:12 margomw

Disregard previous update.

The new MDS data file size is on par , or better than PTDATA after setting the compress_on_write attribute for each node.

81K -rw-rw-r-- 1 kellera kellera 989K Dec 13 16:15 d3dpcs_965525.characteristics 90M -rw-rw-r-- 1 kellera kellera 145M Dec 13 16:15 d3dpcs_965525.datafile 273K -rw-rw-r-- 1 kellera kellera 848K Dec 13 16:15 d3dpcs_965525.tree

----- Original Message ----- From: "Martin Margo" @.> To: "MDSplus" @.> Cc: "Martin Margo" @.>, "Your activity" @.> Sent: Wednesday, December 13, 2023 3:54:18 PM Subject: Re: [MDSplus/mdsplus] MDSplus .data file gets very large (Issue #2658)

Mark

Please inspect the newly patched MDSplus trees. Size still look inflated. Files are in /cscratch/kellera/, named d3dpcs_965525.{datafile|tree|characteristics}

-- Reply to this email directly or view it on GitHub: https://github.com/MDSplus/mdsplus/issues/2658#issuecomment-1854871498 You are receiving this because you are subscribed to this thread.

Message ID: @.***>

margomw avatar Dec 14 '23 03:12 margomw

Hi @margomw and @kgerickson,

Thanks for the update. It is good news that the MDSplus datafiles are now comparable in size to the PTDATA files.

I therefore intend to close this issue later today.

mwinkel-dev avatar Dec 14 '23 16:12 mwinkel-dev

As per previous post, closing this issue as resolved.

mwinkel-dev avatar Dec 15 '23 18:12 mwinkel-dev

Ditto

mwinkel-dev avatar Dec 15 '23 18:12 mwinkel-dev

Reopening this issue because of new observations. Indeed the "compress_on_put" attribute is enabled now with the patched MDSplus model tree generation script. However, after the new pulse is created, the datafile remains large and the attribute has changed.?

Is it possible that the logic to populate the node with signals is able to overwrite the 'compress_on_put' attribute that is set on the model tree?

See screenshots

  1. Created model tree using updated version of the addMDs.bash script Screenshot_2023-12-21_121140
  2. Created the shot using MDSTCL (create_pulse 965534)
  3. Run a PCS simulation shot archiving to 965534
  4. Outcome: compress_on_put is no longer seen in GETNCI

Screenshot_2023-12-21_120805

margomw avatar Dec 27 '23 20:12 margomw

Hi @margomw and @kgerickson -- As per Martin's request, am reopening this issue.

mwinkel-dev avatar Dec 28 '23 00:12 mwinkel-dev

Hi @margomw -- I just did a quick experiment using mdstcl to write a signal to a shot, and it did preserve the compress_on_put attribute.

So, my initial conclusion is that your conjecture is correct -- that it is indeed possible for the script that is populating your shot tree to remove the compress_on_put attribute.

I will do more experiments in the next day or two.

mwinkel-dev avatar Dec 28 '23 01:12 mwinkel-dev

Hi @margomw -- I've just done a cursory spot check of the d3dpcs_965534 tree. And noticed the following.

  • Signals that do not contain data, such as IPA1ECOIL and IPA1LI have retained the compress_on_put attribute.
  • However, signals that do have data, have lost the compress_on_put attribute.

So your conjecture is likely correct that there is something awry with the program that writes data to the shot tree. Is that program written in Python, IDL, C or MATLAB? If you provide more details, I will conduct some quick experiments with the associated MDSplus API.

An alternative approach for compressing the datafile is to use the compress command of mdstcl.

COMPRESS
Rewrites the datafile of the shot currently open compressing any records that can be compressed and are not expicitly set nocompress.

Format: COMPRESS experiment [/SHOT=shot-number]

EXPERIMENT
Name of experiment.

/SHOT=shot_number
Specifies the shot number of the tree to open. Default = -1 which is the model.

https://www.mdsplus.org/index.php/Documentation:Reference:TCL_0

And although unrelated to data compression, note that your second screenshot above shows an insertion date of 16-Nov-1858, which likely means that the system clock is not set correctly on one of the computers you are using.

MIT is closed this week. When I return to work in January, I will resume investigation of this issue.

mwinkel-dev avatar Dec 28 '23 16:12 mwinkel-dev

Hi @margomw -- Also note that the d3dpcs_965534.datafile is unusually small, namely ~7.6 MB.

mwinkel-dev avatar Dec 28 '23 17:12 mwinkel-dev

The MDS wrapper that Keith wrote calls this

return MdsPut2((char *)node, "BUILD_WITH_UNITS($, $)", &d1, (void *)val, &d2, units, &null);

Is it possible that certain data type is uncompressible?

do you recommend using MdsPut vs MdsPut2?

I don't think there is a factor of MDSplus server config, as the files are written locally.

margomw avatar Dec 28 '23 20:12 margomw

Hi @margomw -- Thanks for the additional information. Looks like the wrapper is written in the C language, so I will experiment with that API when I return to work after the holiday break.

Regarding your questions, here are some preliminary answers.

  • The data being written to d3dpcs_965534 is definitely compressible. I used TCL's compress command on a copy of the shot. Compression reduced the datafile to ~3.4 MB, which is ~44% of the original size of ~7.6 MB.

  • The MdsPut2() function is recommended because it avoids some issues that optimizing compilers can cause with the original MdsPut().

  • Because the model and shots are all local on the computer, the compression problem is most likely an issue with the programs associated with the local workflow (dynamically creating the model tree, creating the pulse, writing data to the pulse).

  • The insertion date of 16-Nov-1858 in your screenshot likely means that the clock on the local computer is wrong.

  • Unlikely given how long the C API has been around, but perhaps it contains a bug. I will exercise the API when I return to work after the holidays.

mwinkel-dev avatar Dec 29 '23 16:12 mwinkel-dev

The insertion time being 16-Nov-1858 is probably not about the clock being wrong. This is suspiciously close to the OPENVMS epic which the time representation is based on:

The operating system maintains the current date and time in 64-bit format. The time value is a binary number in 
100-nanosecond (ns) units offset from the system base date and time, which is 00:00 o'clock, November 17, 1858 
(the Smithsonian base date and time for the astronomic calendar).

joshStillerman avatar Jan 02 '24 15:01 joshStillerman

MdsPut2 does not do any tree editing or set the node attributes. I would suspect that there is some code that is editing the tree and re-adding the node, and that the C/C++ api that adds nodes does not default to compress_on_put.

joshStillerman avatar Jan 02 '24 15:01 joshStillerman

Hi @margomw -- Thank you for having me re-open this issue. I wrote a dinky C program and confirmed that MdsPut2() is clearing the compress_on_put flag (and other flags too) on a local tree.

NOTE: Based on inspection of the source code, it is likely that this bug only affects local trees.

As a temporary workaround, here are two suggestions:

  • Switch to the MdsPut() command -- I confirmed that this does preserves the compress_on_put flag. (Note though that in order for MdsPut() to work reliably, it might be necessary to compile your source code with minimal optimization.)
  • Or after all the PCS data is written to the tree, then use the compress command of mdstcl to reduce the size of the data file.

Meanwhile, I will continue investigating to find out why MdsPut2() is clearing all flags.

mwinkel-dev avatar Jan 06 '24 00:01 mwinkel-dev

Hi @margomw and @kgerickson -- There is now a fix for the MdsPut2() bug that was clearing the compress_on_put flag. The fix will be merged to alpha as soon as we have finished maintenance work on the build server. (Plus, additional testing will also be done.)

mwinkel-dev avatar Jan 06 '24 01:01 mwinkel-dev

Manual testing shows that the MdsPut2() fix works for local and distributed trees.

Next task is to test the fix with remote trees accessed via mdsip (aka "thin client").

Also confirmed that MdsPut() is a viable workaround (although a reduced level of compiler optimization might be needed).

mwinkel-dev avatar Jan 06 '24 20:01 mwinkel-dev

There is a bug in MdsPut2() for local files (i.e., an obvious typo). However, testing the fix with mdsip between two VMs is still underway. Currently, it appears that the bug exists in Ubuntu 20 (x86_64) and Rocky 9.1 (arm64). Surprisingly though the bug doesn't appear on Ubuntu 22 (x86_64 and arm64). However, the fix works fine on all platforms.

Next step is to test the bug and fix over mdsip between two Rocky 9.1 VMs. And repeat the Ubuntu 20 (x86_64) test just to double check previous findings.

mwinkel-dev avatar Jan 10 '24 04:01 mwinkel-dev

The fix was fully tested (manually) on two Rocky 9.3 (arm64) VMs. Test results confirm the fix is good.

Here are the findings . . .

  • The node attributes (including compress_on_put) are only getting clobbered by MdsPut2(). No problems were seen with the other function, MdsPut().
  • Furthermore, the bug only affects two modes of operation: local trees and "distributed" trees. The fix will be needed on any computers that use MdsPut2() with these two modes.
  • The bug does not affect "thin-client" operation (i.e., use of mdsconnect to write remote trees via mdsip). Thus, although it is advisable to upgrade mdsip servers to this bug fix, it is not essential to do so.

To ensure there are no oddities with the arm64 architecture, the above manual tests will now be repeated on Rocky 9.3 (x86_64) VMs.

mwinkel-dev avatar Jan 10 '24 19:01 mwinkel-dev