mdsplus icon indicating copy to clipboard operation
mdsplus copied to clipboard

I know how to use TCL COMPRESS command to compress the pulse file, but how can I decompress the pulse file or restore it

Open Merlencer opened this issue 4 years ago • 19 comments

I know how to use TCL COMPRESS command to compress the pulse file, but how can I decompress the pulse file or restore it. Now my datafile is a compressed file. How can I restore it to the previous file.

Merlencer avatar Jul 29 '20 02:07 Merlencer

there is noapi to do that. the data is decompressen on-the-fly when reading it. it could be done adapting the some tools from TreeCleanupDatafile and TreeSegment. May I ask why you want to decompress it. if this is a suitable usecase we might add a feature to do so in some future.

zack-vii avatar Jul 29 '20 08:07 zack-vii

When the compress command is issued the data file for the tree is replaced.  If you want to go back then make a copy of it before compressing.  For current experiments at MIT our tree lifecycle looks like:

new -- copied to archive -->

archive -- compressed to on-line archive -->

archive -- permanent storage  -->  (this is the reference copy of the data as it was taken)

compressed on-line archive -- back up nightly -->

We also run snapshots (daily) of models, new, and compressed on-line archive

-Josh

On 7/29/20 4:37 AM, Timo Schroeder wrote:

there is noapi to do that. the data is decompressen on-the-fly when reading it. it could be done adapting the some tools from TreeCleanupDatafile and TreeSegment. May I ask why you want to decompress it. if this is a suitable usecase we might add a feature to do so in some future.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/MDSplus/mdsplus/issues/2088#issuecomment-665523006, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABY5AZJB2ZTRQGNM4DNBZJTR57NUNANCNFSM4PLDO5MA.

-- Joshua Stillerman Research Engineer MIT Plasma Science and Fusion Center 617.253.8176 [email protected] mailto:[email protected]

joshStillerman avatar Jul 29 '20 13:07 joshStillerman

Some of the datafiles we will do compression. If a data file is frequently accessed, in order to improve the access speed, we need to decompress the data file. Because the decompression speed is much lower than the network transmission speed。

At 2020-07-29 15:37:26, "Timo Schroeder" [email protected] wrote:

there is noapi to do that. the data is decompressen on-the-fly when reading it. it could be done adapting the some tools from TreeCleanupDatafile and TreeSegment. May I ask why you want to decompress it. if this is a suitable usecase we might add a feature to do so in some future.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.

Merlencer avatar Jul 30 '20 02:07 Merlencer

What you mean is that you keep a compressed file for archiving, and another one that is not compressed for others to access?

At 2020-07-29 20:20:13, "Josh Stillerman" [email protected] wrote:

When the compress command is issued the data file for the tree is replaced. If you want to go back then make a copy of it before compressing. For current experiments at MIT our tree lifecycle looks like:

new -- copied to archive -->

archive -- compressed to on-line archive -->

archive -- permanent storage --> (this is the reference copy of the data as it was taken)

compressed on-line archive -- back up nightly -->

We also run snapshots (daily) of models, new, and compressed on-line archive

-Josh

On 7/29/20 4:37 AM, Timo Schroeder wrote:

there is noapi to do that. the data is decompressen on-the-fly when reading it. it could be done adapting the some tools from TreeCleanupDatafile and TreeSegment. May I ask why you want to decompress it. if this is a suitable usecase we might add a feature to do so in some future.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/MDSplus/mdsplus/issues/2088#issuecomment-665523006, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABY5AZJB2ZTRQGNM4DNBZJTR57NUNANCNFSM4PLDO5MA.

-- Joshua Stillerman Research Engineer MIT Plasma Science and Fusion Center 617.253.8176 [email protected] mailto:[email protected]

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.

Merlencer avatar Jul 30 '20 06:07 Merlencer

As far as the current api goes. you would have to manually read the compressed data a store it w/o compresion in a temp tree. but i think what josh sugested is. you store uncompressed during the shot. after the shot you create a copy into an archive location and compress it there. the uncpmpressed original you keep until the phase of high demand is over. the treepath would have the locations in order e.g.: /trees/new/~;/trees/model/~t;/trees/archive/~t;

this would solve it for new data at least.

zack-vii avatar Jul 30 '20 07:07 zack-vii

I am saying we have scripts that take new shots and compress them into a new place called archives.  We have recently become even more conservative:

new shots are copied to  a place called backups before compression, and that is archived off-site elsewhere

new shots are then compressed into a place called archives, and that is backed up off-site elsewhere

Note: most of the data is already compressed since the nodes are 'compress-on-put'

On 7/30/20 2:54 AM, Merlencer wrote:

What you mean is that you keep a compressed file for archiving, and another one that is not compressed for others to access?

At 2020-07-29 20:20:13, "Josh Stillerman" [email protected] wrote:

When the compress command is issued the data file for the tree is replaced. If you want to go back then make a copy of it before compressing. For current experiments at MIT our tree lifecycle looks like:

new -- copied to archive -->

archive -- compressed to on-line archive -->

archive -- permanent storage --> (this is the reference copy of the data as it was taken)

compressed on-line archive -- back up nightly -->

We also run snapshots (daily) of models, new, and compressed on-line archive

-Josh

On 7/29/20 4:37 AM, Timo Schroeder wrote:

there is noapi to do that. the data is decompressen on-the-fly when reading it. it could be done adapting the some tools from TreeCleanupDatafile and TreeSegment. May I ask why you want to decompress it. if this is a suitable usecase we might add a feature to do so in some future.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/MDSplus/mdsplus/issues/2088#issuecomment-665523006, or unsubscribe

https://github.com/notifications/unsubscribe-auth/ABY5AZJB2ZTRQGNM4DNBZJTR57NUNANCNFSM4PLDO5MA.

-- Joshua Stillerman Research Engineer MIT Plasma Science and Fusion Center 617.253.8176 [email protected] mailto:[email protected]

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/MDSplus/mdsplus/issues/2088#issuecomment-666166488, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABY5AZOI2YRAMWHBENADYKDR6EKMHANCNFSM4PLDO5MA.

-- Joshua Stillerman Research Engineer MIT Plasma Science and Fusion Center 617.253.8176 [email protected] mailto:[email protected]

joshStillerman avatar Jul 30 '20 13:07 joshStillerman

Thank you for your patience in explaining these problems to me. I have just used MDSPlus for a few days. I know you said 'compress on put' and I'm trying to use it.

At 2020-07-30 20:23:09, "Josh Stillerman" [email protected] wrote:

I am saying we have scripts that take new shots and compress them into a new place called archives. We have recently become even more conservative:

new shots are copied to a place called backups before compression, and that is archived off-site elsewhere

new shots are then compressed into a place called archives, and that is backed up off-site elsewhere

Note: most of the data is already compressed since the nodes are 'compress-on-put'

On 7/30/20 2:54 AM, Merlencer wrote:

What you mean is that you keep a compressed file for archiving, and another one that is not compressed for others to access?

At 2020-07-29 20:20:13, "Josh Stillerman" [email protected] wrote:

When the compress command is issued the data file for the tree is replaced. If you want to go back then make a copy of it before compressing. For current experiments at MIT our tree lifecycle looks like:

new -- copied to archive -->

archive -- compressed to on-line archive -->

archive -- permanent storage --> (this is the reference copy of the data as it was taken)

compressed on-line archive -- back up nightly -->

We also run snapshots (daily) of models, new, and compressed on-line archive

-Josh

On 7/29/20 4:37 AM, Timo Schroeder wrote:

there is noapi to do that. the data is decompressen on-the-fly when reading it. it could be done adapting the some tools from TreeCleanupDatafile and TreeSegment. May I ask why you want to decompress it. if this is a suitable usecase we might add a feature to do so in some future.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/MDSplus/mdsplus/issues/2088#issuecomment-665523006, or unsubscribe

https://github.com/notifications/unsubscribe-auth/ABY5AZJB2ZTRQGNM4DNBZJTR57NUNANCNFSM4PLDO5MA.

-- Joshua Stillerman Research Engineer MIT Plasma Science and Fusion Center 617.253.8176 [email protected] mailto:[email protected]

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/MDSplus/mdsplus/issues/2088#issuecomment-666166488, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABY5AZOI2YRAMWHBENADYKDR6EKMHANCNFSM4PLDO5MA.

-- Joshua Stillerman Research Engineer MIT Plasma Science and Fusion Center 617.253.8176 [email protected] mailto:[email protected]

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.

Merlencer avatar Jul 31 '20 01:07 Merlencer

Thank you for your patience in explaining these problems to me.

At 2020-07-30 14:34:53, "Timo Schroeder" [email protected] wrote:

As far as the current api goes. you would have to manually read the compressed data a store it w/o compresion in a temp tree. but i think what josh sugested is. you store uncompressed during the shot. after the shot you create a copy into an archive location and compress it there. the uncpmpressed original you keep until the phase of high demand is over. the treepath would have the locations in order e.g.: /trees/new/~;/trees/model/~t;/trees/archive/~t;

this would solve it for new data at least.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.

Merlencer avatar Jul 31 '20 01:07 Merlencer

We always operate with compress-on-put turned on for almost all nodes.  The compress step does a further cleaning out of records that have be replaced by multiple writes to the same node.  So:

1 - we acquire data mostly (all?) compress-on-put into 'new-shots'.  This is where they are accessed on the day of the experiment.  We use SSDs to hold 'new-shots'

2 - we copy the files as is to 'archived-shots'  and then save them to 'tape'.  this is the  copy of record

3 - we compress the new-shots into our on-line archive storage and do incremental backups of that, this is the live copy of the old data.

I can shared the scripts that we use for these purposes if you like, they would need to be modified to suit your setup.

-Josh

On 7/30/20 9:09 PM, Merlencer wrote:

Thank you for your patience in explaining these problems to me.

At 2020-07-30 14:34:53, "Timo Schroeder" [email protected] wrote:

As far as the current api goes. you would have to manually read the compressed data a store it w/o compresion in a temp tree. but i think what josh sugested is. you store uncompressed during the shot. after the shot you create a copy into an archive location and compress it there. the uncpmpressed original you keep until the phase of high demand is over. the treepath would have the locations in order e.g.: /trees/new/~;/trees/model/~t;/trees/archive/~t;

this would solve it for new data at least.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/MDSplus/mdsplus/issues/2088#issuecomment-666853709, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABY5AZO3RUBDG3TJCNJU4P3R6IKTDANCNFSM4PLDO5MA.

-- Joshua Stillerman Research Engineer MIT Plasma Science and Fusion Center 617.253.8176 [email protected] mailto:[email protected]

joshStillerman avatar Jul 31 '20 13:07 joshStillerman

I'm working on a project. We have thousands of channels, so the amount of data is very large, and the data needs to be kept all the time. What I want to do is to reduce the hard disk space occupied by data storage and ensure the accuracy of the data. Since I have just been exposed to MDSPlus, it is uncertain whether the accuracy of the compressed data has changed. So ask you about these questions. I'm glad you can give me these scripts. I need them . Thank you.

At 2020-07-31 20:32:51, "Josh Stillerman" [email protected] wrote:

We always operate with compress-on-put turned on for almost all nodes. The compress step does a further cleaning out of records that have be replaced by multiple writes to the same node. So:

1 - we acquire data mostly (all?) compress-on-put into 'new-shots'. This is where they are accessed on the day of the experiment. We use SSDs to hold 'new-shots'

2 - we copy the files as is to 'archived-shots' and then save them to 'tape'. this is the copy of record

3 - we compress the new-shots into our on-line archive storage and do incremental backups of that, this is the live copy of the old data.

I can shared the scripts that we use for these purposes if you like, they would need to be modified to suit your setup.

-Josh

On 7/30/20 9:09 PM, Merlencer wrote:

Thank you for your patience in explaining these problems to me.

At 2020-07-30 14:34:53, "Timo Schroeder" [email protected] wrote:

As far as the current api goes. you would have to manually read the compressed data a store it w/o compresion in a temp tree. but i think what josh sugested is. you store uncompressed during the shot. after the shot you create a copy into an archive location and compress it there. the uncpmpressed original you keep until the phase of high demand is over. the treepath would have the locations in order e.g.: /trees/new/~;/trees/model/~t;/trees/archive/~t;

this would solve it for new data at least.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/MDSplus/mdsplus/issues/2088#issuecomment-666853709, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABY5AZO3RUBDG3TJCNJU4P3R6IKTDANCNFSM4PLDO5MA.

-- Joshua Stillerman Research Engineer MIT Plasma Science and Fusion Center 617.253.8176 [email protected] mailto:[email protected]

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.

Merlencer avatar Aug 03 '20 01:08 Merlencer

The built-in (default) compression algorithm is lossless. you will get the original data. however it is only effective for integer data upto 32bit (or strings). the compression chunk size it a maximum of 32 bits and bases on the int diff ((int*)x)[1:] - ((int*)x)[:-1]. since int64 is too large and floating point have a different structure. you may not get a lot out of it. If you are in control of the data storage. make sure you split the data into raw e.g.int16 and a scaling term ( setSegmentScaling() or MAKE_SIGNAL($VAULE*_SLOPE+ _OFFSET, _RAW, + _DIM).

zack-vii avatar Aug 03 '20 12:08 zack-vii

The compression is doing automatically and is lossless.  It uses delta value compression.  A value followed by small bit-length deltas from that value until a value is encountered that can not be represented by a delta, then a flag, and a full bit width value, followed again by a series of deltas.

Here are the scripts as an attachment.

On 8/2/20 9:58 PM, Merlencer wrote:

I'm working on a project. We have thousands of channels, so the amount of data is very large, and the data needs to be kept all the time. What I want to do is to reduce the hard disk space occupied by data storage and ensure the accuracy of the data. Since I have just been exposed to MDSPlus, it is uncertain whether the accuracy of the compressed data has changed. So ask you about these questions. I'm glad you can give me these scripts. I need them . Thank you.

At 2020-07-31 20:32:51, "Josh Stillerman" [email protected] wrote:

We always operate with compress-on-put turned on for almost all nodes. The compress step does a further cleaning out of records that have be replaced by multiple writes to the same node. So:

1 - we acquire data mostly (all?) compress-on-put into 'new-shots'. This is where they are accessed on the day of the experiment. We use SSDs to hold 'new-shots'

2 - we copy the files as is to 'archived-shots' and then save them to 'tape'. this is the copy of record

3 - we compress the new-shots into our on-line archive storage and do incremental backups of that, this is the live copy of the old data.

I can shared the scripts that we use for these purposes if you like, they would need to be modified to suit your setup.

-Josh

On 7/30/20 9:09 PM, Merlencer wrote:

Thank you for your patience in explaining these problems to me.

At 2020-07-30 14:34:53, "Timo Schroeder" [email protected] wrote:

As far as the current api goes. you would have to manually read the compressed data a store it w/o compresion in a temp tree. but i think what josh sugested is. you store uncompressed during the shot. after the shot you create a copy into an archive location and compress it there. the uncpmpressed original you keep until the phase of high demand is over. the treepath would have the locations in order e.g.: /trees/new/~;/trees/model/~t;/trees/archive/~t;

this would solve it for new data at least.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/MDSplus/mdsplus/issues/2088#issuecomment-666853709, or unsubscribe

https://github.com/notifications/unsubscribe-auth/ABY5AZO3RUBDG3TJCNJU4P3R6IKTDANCNFSM4PLDO5MA.

-- Joshua Stillerman Research Engineer MIT Plasma Science and Fusion Center 617.253.8176 [email protected] mailto:[email protected]

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/MDSplus/mdsplus/issues/2088#issuecomment-667763770, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABY5AZMQ27VQZDYRK3MYHVTR6YKVLANCNFSM4PLDO5MA.

-- Joshua Stillerman Research Engineer MIT Plasma Science and Fusion Center 617.253.8176 [email protected] mailto:[email protected]

joshStillerman avatar Aug 03 '20 13:08 joshStillerman

Our current database is being improved. At present, most of the data are int16 type, and then converted to float 32 type through AD conversion, but there are still some float 32 data. I would like to confirm whether the accuracy of this part of float32 data will be lost after compression.

At 2020-08-03 19:12:12, "Timo Schroeder" [email protected] wrote:

The built-in (default) compression algorithm is lossless. you will get the original data. however it is only effective for integer data upto 32bit (or strings). the compression chunk size it a maximum of 32 bits and bases on the int diff ((int*)x)[1:] - ((int*)x)[:-1]. since int64 is too large and floating point have a different structure. you may not get a lot out of it. If you are in control of the data storage. make sure you split the data into raw e.g.int16 and a scaling term ( setSegmentScaling() or MAKE_SIGNAL($VAULE*_SLOPE+ _OFFSET, _RAW, + _DIM).

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.

Merlencer avatar Aug 04 '20 01:08 Merlencer

Is it that I don't know how to use GitHub or email, I can't find the attachment......

At 2020-08-03 20:18:40, "Josh Stillerman" [email protected] wrote:

The compression is doing automatically and is lossless. It uses delta value compression. A value followed by small bit-length deltas from that value until a value is encountered that can not be represented by a delta, then a flag, and a full bit width value, followed again by a series of deltas.

Here are the scripts as an attachment.

On 8/2/20 9:58 PM, Merlencer wrote:

I'm working on a project. We have thousands of channels, so the amount of data is very large, and the data needs to be kept all the time. What I want to do is to reduce the hard disk space occupied by data storage and ensure the accuracy of the data. Since I have just been exposed to MDSPlus, it is uncertain whether the accuracy of the compressed data has changed. So ask you about these questions. I'm glad you can give me these scripts. I need them . Thank you.

At 2020-07-31 20:32:51, "Josh Stillerman" [email protected] wrote:

We always operate with compress-on-put turned on for almost all nodes. The compress step does a further cleaning out of records that have be replaced by multiple writes to the same node. So:

1 - we acquire data mostly (all?) compress-on-put into 'new-shots'. This is where they are accessed on the day of the experiment. We use SSDs to hold 'new-shots'

2 - we copy the files as is to 'archived-shots' and then save them to 'tape'. this is the copy of record

3 - we compress the new-shots into our on-line archive storage and do incremental backups of that, this is the live copy of the old data.

I can shared the scripts that we use for these purposes if you like, they would need to be modified to suit your setup.

-Josh

On 7/30/20 9:09 PM, Merlencer wrote:

Thank you for your patience in explaining these problems to me.

At 2020-07-30 14:34:53, "Timo Schroeder" [email protected] wrote:

As far as the current api goes. you would have to manually read the compressed data a store it w/o compresion in a temp tree. but i think what josh sugested is. you store uncompressed during the shot. after the shot you create a copy into an archive location and compress it there. the uncpmpressed original you keep until the phase of high demand is over. the treepath would have the locations in order e.g.: /trees/new/~;/trees/model/~t;/trees/archive/~t;

this would solve it for new data at least.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/MDSplus/mdsplus/issues/2088#issuecomment-666853709, or unsubscribe

https://github.com/notifications/unsubscribe-auth/ABY5AZO3RUBDG3TJCNJU4P3R6IKTDANCNFSM4PLDO5MA.

-- Joshua Stillerman Research Engineer MIT Plasma Science and Fusion Center 617.253.8176 [email protected] mailto:[email protected]

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/MDSplus/mdsplus/issues/2088#issuecomment-667763770, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABY5AZMQ27VQZDYRK3MYHVTR6YKVLANCNFSM4PLDO5MA.

-- Joshua Stillerman Research Engineer MIT Plasma Science and Fusion Center 617.253.8176 [email protected] mailto:[email protected]

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.

Merlencer avatar Aug 04 '20 02:08 Merlencer

github discards attachments josh will have to send to your email.

@joshStillerman : we could add the scipts to the repo under ./scripts/shot-cycle or similar

zack-vii avatar Aug 04 '20 02:08 zack-vii

its always lossless but in some cases not smaller and the compressed version is discarded in favour of the original data. the ratio of compressed vs uncompressed is reflected in the RLENGTH vs. LENGTH nci.

GETNCI(PATH:TO:NODE, "RLENGTH")
GETNCI(PATH:TO:NODE, "LENGTH")

rlength vs length field of the TreeNode in python

zack-vii avatar Aug 04 '20 08:08 zack-vii

Thank you for answering so many questions for me.

At 2020-08-04 15:30:20, "Timo Schroeder" [email protected] wrote:

its always lossless but in some cases not smaller and the compressed version is discarded in favour of the original data. the ratio of compressed vs uncompressed is reflected in the RLENGTH vs. LENGTH nci.

GETNCI(PATH:TO:NODE, "RLENGTH") GETNCI(PATH:TO:NODE, "LENGTH")

rlength vs length field of the TreeNode in python

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.

Merlencer avatar Aug 05 '20 00:08 Merlencer

I also want to ask a question. At present, I have built a new tree (actually a folder) named my_ tree。 Since we have done more than 10000 experiments, there are tens of thousands of files in this folder. Obviously, these files are not easy to manage. How can we manage these tens of thousands of files (and even more in the future) more effectively, and also through my_ Trees access the data in these files.

At 2020-08-04 15:30:20, "Timo Schroeder" [email protected] wrote:

its always lossless but in some cases not smaller and the compressed version is discarded in favour of the original data. the ratio of compressed vs uncompressed is reflected in the RLENGTH vs. LENGTH nci.

GETNCI(PATH:TO:NODE, "RLENGTH") GETNCI(PATH:TO:NODE, "LENGTH")

rlength vs length field of the TreeNode in python

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.

Merlencer avatar Aug 06 '20 06:08 Merlencer

sorry i was sure i answered that via email. please check the official documentation https://mdsplus.org/index.php?title=Documentation:TreeAccess in particular the ~a-j and ~t syntax.

zack-vii avatar Sep 05 '20 06:09 zack-vii