mdsplus
mdsplus copied to clipboard
MDSplus .data file gets very large
Affiliation General Atomics
Version(s) Affected 7.132-0.el7
Platform CentOS Linux 7
Describe the bug A development branch of PCS code outputs data to PTDATA (GA in house database) and MDSplus. The data is assumably identical. We ran 14 shots and the average PTDATA file size is 142 MB while MDSplus size is 928 MB
To Reproduce Steps to reproduce the behavior:
- Run a development branch of PCS on Saga or Iris computer (at GA)
- Cycle PCS
- Read the data in /fusion/d3d/d3share/d3data/d3dpcs_965507.datafile
Expected behavior We expect that the file size of MDSplus is on par with PTDATA especially since compression is automatically turned on for MDS
Screenshots If applicable, add screenshots to help explain your problem.
Additional context Data can also be seen from Omega cluster
During a conversation with @kgerickson and @margomw, these additional details were provided:
- When an array of 10K elements (int32, float32 or float64) is written to a simple tree, the file size is comparable to what is seen when the same data is written to GA's other data system, PTDATA.
- The 8x to 10x data explosion only occurs when writing to the MDSplus shot tree (for the DIII-D plasma control system) that is created with the "create pulse" command.
- The PCS model tree is created dynamically based on a configuration file.
- The PCS model tree has no data in the "*.datafile" -- thus the data expansion is occurring on write to the shot tree.
- The PCS tree does not use segmented signals.
- All signals are created with TCL's
build_signal(<data>, , <ref_to_timebase_node>)
construct. No raw data. And because this construct is referencing a timebase node, presumably is only storing the node reference -- thus not instantiating the timebase multiple times and thus increasing the file size.
Did an ls -lh
of the suggested shot and it is only ~213 MB in size. Which is ~71 MB bigger (i.e., 1.5x bigger) than the PTDATA average of ~142 MB, but not at all close to the ~928 MB average size reported in the initial bug report. In the following ls
output, note the M
and K
suffixes on the file sizes.
$ ls -lh /fusion/d3d/d3share/d3data/d3dpcs_965507.*
-rw-r--r-- 1 8848 pcsgrp 1.1M Nov 13 12:20 /fusion/d3d/d3share/d3data/d3dpcs_965507.characteristics
-rw-r--r-- 1 8848 pcsgrp 211M Nov 13 12:20 /fusion/d3d/d3share/d3data/d3dpcs_965507.datafile
-rw-r--r-- 1 8848 pcsgrp 935K Nov 13 12:20 /fusion/d3d/d3share/d3data/d3dpcs_965507.tree
Is there another shot that is more representative of the ~8x to ~10x expansion of the data (compared to PTDATA)? If so, please suggest some other shots for me to examine.
Thanks, -MW
Mark,
The files can be downloaded from Omega, path /cscratch/kellera/
Here is a listing @.*** kellera]$ ls -lh total 3.3G -rw-rw-r-- 1 kellera kellera 860M Dec 6 09:11 d3dpcs_965504.datafile -rw-rw-r-- 1 kellera kellera 857M Dec 6 09:11 d3dpcs_965505.datafile -rw-rw-r-- 1 kellera kellera 950M Dec 6 09:11 d3dpcs_965506.datafile -rw-rw-r-- 1 kellera kellera 963M Dec 6 09:11 d3dpcs_965508.datafile -rw-rw-r-- 1 kellera kellera 944M Dec 6 09:11 d3dpcs_965509.datafile -rw-rw-r-- 1 kellera kellera 888M Dec 6 09:11 d3dpcs_965510.datafile -rw-rw-r-- 1 kellera kellera 880M Dec 6 09:11 d3dpcs_965511.datafile -rw-rw-r-- 1 kellera kellera 864M Dec 6 09:11 d3dpcs_965512.datafile -rw-rw-r-- 1 kellera kellera 910M Dec 6 09:11 d3dpcs_965513.datafile -rw-rw-r-- 1 kellera kellera 924M Dec 6 09:11 d3dpcs_965514.datafile -rw-rw-r-- 1 kellera kellera 914M Dec 6 09:11 d3dpcs_965515.datafile -rw-rw-r-- 1 kellera kellera 1019M Dec 6 09:11 d3dpcs_965516.datafile -rw-rw-r-- 1 kellera kellera 919M Dec 6 09:11 d3dpcs_965517.datafile -rw-rw-r-- 1 kellera kellera 947M Dec 6 09:11 d3dpcs_965518.datafile -rw-rw-r-- 1 kellera kellera 0 Dec 6 09:11 d3dpcs_model.datafile
----- Original Message ----- From: "Martin Margo" @.> To: "MDSplus/mdsplus" @.> Cc: "Martin Margo" @.>, "Your activity" @.> Sent: Friday, December 1, 2023 4:05:34 PM Subject: [MDSplus/mdsplus] MDSplus .data file gets very large (Issue #2658)
Affiliation General Atomics
Version(s) Affected 7.132-0.el7
Platform CentOS Linux 7
Describe the bug A development branch of PCS code outputs data to PTDATA (GA in house database) and MDSplus. The data is assumably identical. We ran 14 shots and the average PTDATA file size is 142 MB while MDSplus size is 928 MB
To Reproduce Steps to reproduce the behavior:
- Run a development branch of PCS on Saga or Iris computer (at GA)
- Cycle PCS
- Read the data in /fusion/d3d/d3share/d3data/d3dpcs_965507.datafile
Expected behavior We expect that the file size of MDSplus is on par with PTDATA especially since compression is automatically turned on for MDS
Screenshots If applicable, add screenshots to help explain your problem.
Additional context Data can also be seen from Omega cluster
-- Reply to this email directly or view it on GitHub: https://github.com/MDSplus/mdsplus/issues/2658 You are receiving this because you are subscribed to this thread.
Message ID: @.***>
Hi @margomw -- Thanks for the list of files. I will work with two shots: 965516 (the largest) and the model. I have copied both of those *.datafile to my home directory.
However, I also need the *.tree and *.characterstics files for the 965516 shot and the model. Let me know when the *.tree and *.characteristics are also available in /cscratch/kellera
.
Thanks, -MW
@mwinkel-dev : *.tree and *.characteristics are installed in /cscratch/kellera on Omega. Please inspect and tell us if you find anything interesting. Thank you
Hi @margomw and @kgerickson -- Thanks for the *.tree and *.characteristics files.
I did a cursory inspection and it appears that some nodes in the tree are not being compressed. Compressing the shot reduced its datafile by a factor of ~5.
Here are the steps . . .
- I took the
d3dpcs_965516
shot and copied it toxmw_516
- Then ran
mdstcl
- And in that utility, ran the
compress xmw /shot=516
command - quit
mdstcl
- And ran a
ls -lh *516.datafile
which produced the following output
$ ls -lh *516.datafile
-rw-rw-r-- 1 winkelm winkelm 1019M Dec 6 14:04 d3dpcs_965516.datafile
-rw-rw-r-- 1 winkelm winkelm 202M Dec 7 10:39 xmw_516.datafile
This is a surprising amount of compression. Later today, I will investigate to see which nodes in the tree were ~5x bigger than expected.
A spot check of the d3dpcs_965516
shot shows that some signals have the compress_on_put
attribute, but many signals do not (i.e., they are just flagged as compressible
). The timebases
might have the same issue.
Because the model
is generated dynamically, I suggest examining the program that creates it. Ensure that the program is adding the compress_on_put
attribute to all signals and arrays.
After making that change, let me know if it fixed the data explosion problem for the PCS data.
Thanks for that! I know where to look for that fix.
On Fri, Dec 8, 2023 at 04:26 mwinkel-dev @.***> wrote:
A spot check of the d3dpcs_965516 shot shows that some signals have the compress_on_put attribute, but many signals do not (i.e., they are just flagged as compressible). The timebases might have the same issue.
Because the model is generated dynamically, I suggest examining the program that creates it. Ensure that the program is adding the compress_on_put attribute to all signals and arrays.
After making that change, let me know if it fixed the data explosion problem for the PCS data.
— Reply to this email directly, view it on GitHub https://github.com/MDSplus/mdsplus/issues/2658#issuecomment-1845973036, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADL6D6TDP5D6OWGK4TTUZQLYIIJ7TAVCNFSM6AAAAABADRZ4G2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNBVHE3TGMBTGY . You are receiving this because you were mentioned.Message ID: @.***>
Hi @kgerickson and @margomw -- Did adding compress_on_put
eliminate the problem with the huge datafiles? If so, let me know and I'll then close this issue.
Hello
I have patched the MDSTCL script to set the compress_on_put. I am waiting for the other engineer to independently try the fix.
Martin
----- Original Message ----- From: "mwinkel-dev" @.> To: "MDSplus" @.> Cc: "Martin Margo" @.>, "Mention" @.> Sent: Wednesday, December 13, 2023 11:11:12 AM Subject: Re: [MDSplus/mdsplus] MDSplus .data file gets very large (Issue #2658)
Hi @kgerickson and @margomw -- Did adding compress_on_put
eliminate the problem with the huge datafiles? If so, let me now and I'll then close this issue.
-- Reply to this email directly or view it on GitHub: https://github.com/MDSplus/mdsplus/issues/2658#issuecomment-1854556425 You are receiving this because you were mentioned.
Message ID: @.***>
Hi Martin,
Thanks for the update.
-Mark
Mark
Please inspect the newly patched MDSplus trees. Size still look inflated. Files are in /cscratch/kellera/, named d3dpcs_965525.{datafile|tree|characteristics}
The .datafile is 947 MB.
Disregard previous update.
The new MDS data file size is on par , or better than PTDATA after setting the compress_on_write attribute for each node.
81K -rw-rw-r-- 1 kellera kellera 989K Dec 13 16:15 d3dpcs_965525.characteristics 90M -rw-rw-r-- 1 kellera kellera 145M Dec 13 16:15 d3dpcs_965525.datafile 273K -rw-rw-r-- 1 kellera kellera 848K Dec 13 16:15 d3dpcs_965525.tree
----- Original Message ----- From: "Martin Margo" @.> To: "MDSplus" @.> Cc: "Martin Margo" @.>, "Your activity" @.> Sent: Wednesday, December 13, 2023 3:54:18 PM Subject: Re: [MDSplus/mdsplus] MDSplus .data file gets very large (Issue #2658)
Mark
Please inspect the newly patched MDSplus trees. Size still look inflated. Files are in /cscratch/kellera/, named d3dpcs_965525.{datafile|tree|characteristics}
-- Reply to this email directly or view it on GitHub: https://github.com/MDSplus/mdsplus/issues/2658#issuecomment-1854871498 You are receiving this because you are subscribed to this thread.
Message ID: @.***>
Hi @margomw and @kgerickson,
Thanks for the update. It is good news that the MDSplus datafiles are now comparable in size to the PTDATA files.
I therefore intend to close this issue later today.
As per previous post, closing this issue as resolved.
Ditto
Reopening this issue because of new observations. Indeed the "compress_on_put" attribute is enabled now with the patched MDSplus model tree generation script. However, after the new pulse is created, the datafile remains large and the attribute has changed.?
Is it possible that the logic to populate the node with signals is able to overwrite the 'compress_on_put' attribute that is set on the model tree?
See screenshots
- Created model tree using updated version of the addMDs.bash script
- Created the shot using MDSTCL (create_pulse 965534)
- Run a PCS simulation shot archiving to 965534
- Outcome: compress_on_put is no longer seen in GETNCI
Hi @margomw and @kgerickson -- As per Martin's request, am reopening this issue.
Hi @margomw -- I just did a quick experiment using mdstcl
to write a signal to a shot, and it did preserve the compress_on_put
attribute.
So, my initial conclusion is that your conjecture is correct -- that it is indeed possible for the script that is populating your shot tree to remove the compress_on_put
attribute.
I will do more experiments in the next day or two.
Hi @margomw -- I've just done a cursory spot check of the d3dpcs_965534
tree. And noticed the following.
- Signals that do not contain data, such as
IPA1ECOIL
andIPA1LI
have retained thecompress_on_put
attribute. - However, signals that do have data, have lost the
compress_on_put
attribute.
So your conjecture is likely correct that there is something awry with the program that writes data to the shot tree. Is that program written in Python, IDL, C or MATLAB? If you provide more details, I will conduct some quick experiments with the associated MDSplus API.
An alternative approach for compressing the datafile is to use the compress
command of mdstcl
.
COMPRESS
Rewrites the datafile of the shot currently open compressing any records that can be compressed and are not expicitly set nocompress.
Format: COMPRESS experiment [/SHOT=shot-number]
EXPERIMENT
Name of experiment.
/SHOT=shot_number
Specifies the shot number of the tree to open. Default = -1 which is the model.
https://www.mdsplus.org/index.php/Documentation:Reference:TCL_0
And although unrelated to data compression, note that your second screenshot above shows an insertion date of 16-Nov-1858
, which likely means that the system clock is not set correctly on one of the computers you are using.
MIT is closed this week. When I return to work in January, I will resume investigation of this issue.
Hi @margomw -- Also note that the d3dpcs_965534.datafile
is unusually small, namely ~7.6 MB.
The MDS wrapper that Keith wrote calls this
return MdsPut2((char *)node, "BUILD_WITH_UNITS($, $)", &d1, (void *)val, &d2, units, &null);
Is it possible that certain data type is uncompressible?
do you recommend using MdsPut vs MdsPut2?
I don't think there is a factor of MDSplus server config, as the files are written locally.
Hi @margomw -- Thanks for the additional information. Looks like the wrapper is written in the C
language, so I will experiment with that API when I return to work after the holiday break.
Regarding your questions, here are some preliminary answers.
-
The data being written to
d3dpcs_965534
is definitely compressible. I used TCL'scompress
command on a copy of the shot. Compression reduced the datafile to ~3.4 MB, which is ~44% of the original size of ~7.6 MB. -
The
MdsPut2()
function is recommended because it avoids some issues that optimizing compilers can cause with the originalMdsPut()
. -
Because the model and shots are all local on the computer, the compression problem is most likely an issue with the programs associated with the local workflow (dynamically creating the model tree, creating the pulse, writing data to the pulse).
-
The insertion date of
16-Nov-1858
in your screenshot likely means that the clock on the local computer is wrong. -
Unlikely given how long the
C
API has been around, but perhaps it contains a bug. I will exercise the API when I return to work after the holidays.
The insertion time being 16-Nov-1858
is probably not about the clock being wrong. This is suspiciously close to the OPENVMS epic
which the time representation is based on:
The operating system maintains the current date and time in 64-bit format. The time value is a binary number in
100-nanosecond (ns) units offset from the system base date and time, which is 00:00 o'clock, November 17, 1858
(the Smithsonian base date and time for the astronomic calendar).
MdsPut2
does not do any tree editing or set the node attributes. I would suspect that there is some code that is editing the tree and re-adding the node, and that the C/C++ api that adds nodes does not default to compress_on_put
.
Hi @margomw -- Thank you for having me re-open this issue. I wrote a dinky C program and confirmed that MdsPut2()
is clearing the compress_on_put
flag (and other flags too) on a local tree.
NOTE: Based on inspection of the source code, it is likely that this bug only affects local trees.
As a temporary workaround, here are two suggestions:
- Switch to the
MdsPut()
command -- I confirmed that this does preserves thecompress_on_put
flag. (Note though that in order forMdsPut()
to work reliably, it might be necessary to compile your source code with minimal optimization.) - Or after all the PCS data is written to the tree, then use the
compress
command ofmdstcl
to reduce the size of the data file.
Meanwhile, I will continue investigating to find out why MdsPut2()
is clearing all flags.
Hi @margomw and @kgerickson -- There is now a fix for the MdsPut2()
bug that was clearing the compress_on_put
flag. The fix will be merged to alpha as soon as we have finished maintenance work on the build server. (Plus, additional testing will also be done.)
Manual testing shows that the MdsPut2()
fix works for local and distributed trees.
Next task is to test the fix with remote trees accessed via mdsip (aka "thin client").
Also confirmed that MdsPut()
is a viable workaround (although a reduced level of compiler optimization might be needed).
There is a bug in MdsPut2()
for local files (i.e., an obvious typo). However, testing the fix with mdsip
between two VMs is still underway. Currently, it appears that the bug exists in Ubuntu 20 (x86_64) and Rocky 9.1 (arm64). Surprisingly though the bug doesn't appear on Ubuntu 22 (x86_64 and arm64). However, the fix works fine on all platforms.
Next step is to test the bug and fix over mdsip
between two Rocky 9.1 VMs. And repeat the Ubuntu 20 (x86_64) test just to double check previous findings.
The fix was fully tested (manually) on two Rocky 9.3 (arm64) VMs. Test results confirm the fix is good.
Here are the findings . . .
- The node attributes (including
compress_on_put
) are only getting clobbered byMdsPut2()
. No problems were seen with the other function,MdsPut()
. - Furthermore, the bug only affects two modes of operation: local trees and "distributed" trees. The fix will be needed on any computers that use
MdsPut2()
with these two modes. - The bug does not affect "thin-client" operation (i.e., use of
mdsconnect
to write remote trees viamdsip
). Thus, although it is advisable to upgrade mdsip servers to this bug fix, it is not essential to do so.
To ensure there are no oddities with the arm64 architecture, the above manual tests will now be repeated on Rocky 9.3 (x86_64) VMs.