mdsplus
mdsplus copied to clipboard
I know how to use TCL COMPRESS command to compress the pulse file, but how can I decompress the pulse file or restore it
I know how to use TCL COMPRESS command to compress the pulse file, but how can I decompress the pulse file or restore it. Now my datafile is a compressed file. How can I restore it to the previous file.
there is noapi to do that. the data is decompressen on-the-fly when reading it. it could be done adapting the some tools from TreeCleanupDatafile and TreeSegment. May I ask why you want to decompress it. if this is a suitable usecase we might add a feature to do so in some future.
When the compress command is issued the data file for the tree is replaced. If you want to go back then make a copy of it before compressing. For current experiments at MIT our tree lifecycle looks like:
new -- copied to archive -->
archive -- compressed to on-line archive -->
archive -- permanent storage --> (this is the reference copy of the data as it was taken)
compressed on-line archive -- back up nightly -->
We also run snapshots (daily) of models, new, and compressed on-line archive
-Josh
On 7/29/20 4:37 AM, Timo Schroeder wrote:
there is noapi to do that. the data is decompressen on-the-fly when reading it. it could be done adapting the some tools from TreeCleanupDatafile and TreeSegment. May I ask why you want to decompress it. if this is a suitable usecase we might add a feature to do so in some future.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/MDSplus/mdsplus/issues/2088#issuecomment-665523006, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABY5AZJB2ZTRQGNM4DNBZJTR57NUNANCNFSM4PLDO5MA.
-- Joshua Stillerman Research Engineer MIT Plasma Science and Fusion Center 617.253.8176 [email protected] mailto:[email protected]
Some of the datafiles we will do compression. If a data file is frequently accessed, in order to improve the access speed, we need to decompress the data file. Because the decompression speed is much lower than the network transmission speed。
At 2020-07-29 15:37:26, "Timo Schroeder" [email protected] wrote:
there is noapi to do that. the data is decompressen on-the-fly when reading it. it could be done adapting the some tools from TreeCleanupDatafile and TreeSegment. May I ask why you want to decompress it. if this is a suitable usecase we might add a feature to do so in some future.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.
What you mean is that you keep a compressed file for archiving, and another one that is not compressed for others to access?
At 2020-07-29 20:20:13, "Josh Stillerman" [email protected] wrote:
When the compress command is issued the data file for the tree is replaced. If you want to go back then make a copy of it before compressing. For current experiments at MIT our tree lifecycle looks like:
new -- copied to archive -->
archive -- compressed to on-line archive -->
archive -- permanent storage --> (this is the reference copy of the data as it was taken)
compressed on-line archive -- back up nightly -->
We also run snapshots (daily) of models, new, and compressed on-line archive
-Josh
On 7/29/20 4:37 AM, Timo Schroeder wrote:
there is noapi to do that. the data is decompressen on-the-fly when reading it. it could be done adapting the some tools from TreeCleanupDatafile and TreeSegment. May I ask why you want to decompress it. if this is a suitable usecase we might add a feature to do so in some future.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/MDSplus/mdsplus/issues/2088#issuecomment-665523006, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABY5AZJB2ZTRQGNM4DNBZJTR57NUNANCNFSM4PLDO5MA.
-- Joshua Stillerman Research Engineer MIT Plasma Science and Fusion Center 617.253.8176 [email protected] mailto:[email protected]
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.
As far as the current api goes. you would have to manually read the compressed data a store it w/o compresion in a temp tree. but i think what josh sugested is. you store uncompressed during the shot. after the shot you create a copy into an archive location and compress it there. the uncpmpressed original you keep until the phase of high demand is over. the treepath would have the locations in order e.g.: /trees/new/~;/trees/model/~t;/trees/archive/~t;
this would solve it for new data at least.
I am saying we have scripts that take new shots and compress them into a new place called archives. We have recently become even more conservative:
new shots are copied to a place called backups before compression, and that is archived off-site elsewhere
new shots are then compressed into a place called archives, and that is backed up off-site elsewhere
Note: most of the data is already compressed since the nodes are 'compress-on-put'
On 7/30/20 2:54 AM, Merlencer wrote:
What you mean is that you keep a compressed file for archiving, and another one that is not compressed for others to access?
At 2020-07-29 20:20:13, "Josh Stillerman" [email protected] wrote:
When the compress command is issued the data file for the tree is replaced. If you want to go back then make a copy of it before compressing. For current experiments at MIT our tree lifecycle looks like:
new -- copied to archive -->
archive -- compressed to on-line archive -->
archive -- permanent storage --> (this is the reference copy of the data as it was taken)
compressed on-line archive -- back up nightly -->
We also run snapshots (daily) of models, new, and compressed on-line archive
-Josh
On 7/29/20 4:37 AM, Timo Schroeder wrote:
there is noapi to do that. the data is decompressen on-the-fly when reading it. it could be done adapting the some tools from TreeCleanupDatafile and TreeSegment. May I ask why you want to decompress it. if this is a suitable usecase we might add a feature to do so in some future.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/MDSplus/mdsplus/issues/2088#issuecomment-665523006, or unsubscribe
https://github.com/notifications/unsubscribe-auth/ABY5AZJB2ZTRQGNM4DNBZJTR57NUNANCNFSM4PLDO5MA.
-- Joshua Stillerman Research Engineer MIT Plasma Science and Fusion Center 617.253.8176 [email protected] mailto:[email protected]
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/MDSplus/mdsplus/issues/2088#issuecomment-666166488, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABY5AZOI2YRAMWHBENADYKDR6EKMHANCNFSM4PLDO5MA.
-- Joshua Stillerman Research Engineer MIT Plasma Science and Fusion Center 617.253.8176 [email protected] mailto:[email protected]
Thank you for your patience in explaining these problems to me. I have just used MDSPlus for a few days. I know you said 'compress on put' and I'm trying to use it.
At 2020-07-30 20:23:09, "Josh Stillerman" [email protected] wrote:
I am saying we have scripts that take new shots and compress them into a new place called archives. We have recently become even more conservative:
new shots are copied to a place called backups before compression, and that is archived off-site elsewhere
new shots are then compressed into a place called archives, and that is backed up off-site elsewhere
Note: most of the data is already compressed since the nodes are 'compress-on-put'
On 7/30/20 2:54 AM, Merlencer wrote:
What you mean is that you keep a compressed file for archiving, and another one that is not compressed for others to access?
At 2020-07-29 20:20:13, "Josh Stillerman" [email protected] wrote:
When the compress command is issued the data file for the tree is replaced. If you want to go back then make a copy of it before compressing. For current experiments at MIT our tree lifecycle looks like:
new -- copied to archive -->
archive -- compressed to on-line archive -->
archive -- permanent storage --> (this is the reference copy of the data as it was taken)
compressed on-line archive -- back up nightly -->
We also run snapshots (daily) of models, new, and compressed on-line archive
-Josh
On 7/29/20 4:37 AM, Timo Schroeder wrote:
there is noapi to do that. the data is decompressen on-the-fly when reading it. it could be done adapting the some tools from TreeCleanupDatafile and TreeSegment. May I ask why you want to decompress it. if this is a suitable usecase we might add a feature to do so in some future.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/MDSplus/mdsplus/issues/2088#issuecomment-665523006, or unsubscribe
https://github.com/notifications/unsubscribe-auth/ABY5AZJB2ZTRQGNM4DNBZJTR57NUNANCNFSM4PLDO5MA.
-- Joshua Stillerman Research Engineer MIT Plasma Science and Fusion Center 617.253.8176 [email protected] mailto:[email protected]
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/MDSplus/mdsplus/issues/2088#issuecomment-666166488, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABY5AZOI2YRAMWHBENADYKDR6EKMHANCNFSM4PLDO5MA.
-- Joshua Stillerman Research Engineer MIT Plasma Science and Fusion Center 617.253.8176 [email protected] mailto:[email protected]
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.
Thank you for your patience in explaining these problems to me.
At 2020-07-30 14:34:53, "Timo Schroeder" [email protected] wrote:
As far as the current api goes. you would have to manually read the compressed data a store it w/o compresion in a temp tree. but i think what josh sugested is. you store uncompressed during the shot. after the shot you create a copy into an archive location and compress it there. the uncpmpressed original you keep until the phase of high demand is over. the treepath would have the locations in order e.g.: /trees/new/~;/trees/model/~t;/trees/archive/~t;
this would solve it for new data at least.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.
We always operate with compress-on-put turned on for almost all nodes. The compress step does a further cleaning out of records that have be replaced by multiple writes to the same node. So:
1 - we acquire data mostly (all?) compress-on-put into 'new-shots'. This is where they are accessed on the day of the experiment. We use SSDs to hold 'new-shots'
2 - we copy the files as is to 'archived-shots' and then save them to 'tape'. this is the copy of record
3 - we compress the new-shots into our on-line archive storage and do incremental backups of that, this is the live copy of the old data.
I can shared the scripts that we use for these purposes if you like, they would need to be modified to suit your setup.
-Josh
On 7/30/20 9:09 PM, Merlencer wrote:
Thank you for your patience in explaining these problems to me.
At 2020-07-30 14:34:53, "Timo Schroeder" [email protected] wrote:
As far as the current api goes. you would have to manually read the compressed data a store it w/o compresion in a temp tree. but i think what josh sugested is. you store uncompressed during the shot. after the shot you create a copy into an archive location and compress it there. the uncpmpressed original you keep until the phase of high demand is over. the treepath would have the locations in order e.g.: /trees/new/~;/trees/model/~t;/trees/archive/~t;
this would solve it for new data at least.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/MDSplus/mdsplus/issues/2088#issuecomment-666853709, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABY5AZO3RUBDG3TJCNJU4P3R6IKTDANCNFSM4PLDO5MA.
-- Joshua Stillerman Research Engineer MIT Plasma Science and Fusion Center 617.253.8176 [email protected] mailto:[email protected]
I'm working on a project. We have thousands of channels, so the amount of data is very large, and the data needs to be kept all the time. What I want to do is to reduce the hard disk space occupied by data storage and ensure the accuracy of the data. Since I have just been exposed to MDSPlus, it is uncertain whether the accuracy of the compressed data has changed. So ask you about these questions. I'm glad you can give me these scripts. I need them . Thank you.
At 2020-07-31 20:32:51, "Josh Stillerman" [email protected] wrote:
We always operate with compress-on-put turned on for almost all nodes. The compress step does a further cleaning out of records that have be replaced by multiple writes to the same node. So:
1 - we acquire data mostly (all?) compress-on-put into 'new-shots'. This is where they are accessed on the day of the experiment. We use SSDs to hold 'new-shots'
2 - we copy the files as is to 'archived-shots' and then save them to 'tape'. this is the copy of record
3 - we compress the new-shots into our on-line archive storage and do incremental backups of that, this is the live copy of the old data.
I can shared the scripts that we use for these purposes if you like, they would need to be modified to suit your setup.
-Josh
On 7/30/20 9:09 PM, Merlencer wrote:
Thank you for your patience in explaining these problems to me.
At 2020-07-30 14:34:53, "Timo Schroeder" [email protected] wrote:
As far as the current api goes. you would have to manually read the compressed data a store it w/o compresion in a temp tree. but i think what josh sugested is. you store uncompressed during the shot. after the shot you create a copy into an archive location and compress it there. the uncpmpressed original you keep until the phase of high demand is over. the treepath would have the locations in order e.g.: /trees/new/~;/trees/model/~t;/trees/archive/~t;
this would solve it for new data at least.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/MDSplus/mdsplus/issues/2088#issuecomment-666853709, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABY5AZO3RUBDG3TJCNJU4P3R6IKTDANCNFSM4PLDO5MA.
-- Joshua Stillerman Research Engineer MIT Plasma Science and Fusion Center 617.253.8176 [email protected] mailto:[email protected]
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.
The built-in (default) compression algorithm is lossless. you will get the original data. however it is only effective for integer data upto 32bit (or strings). the compression chunk size it a maximum of 32 bits and bases on the int diff ((int*)x)[1:] - ((int*)x)[:-1]. since int64 is too large and floating point have a different structure. you may not get a lot out of it. If you are in control of the data storage. make sure you split the data into raw e.g.int16 and a scaling term ( setSegmentScaling() or MAKE_SIGNAL($VAULE*_SLOPE+ _OFFSET, _RAW, + _DIM).
The compression is doing automatically and is lossless. It uses delta value compression. A value followed by small bit-length deltas from that value until a value is encountered that can not be represented by a delta, then a flag, and a full bit width value, followed again by a series of deltas.
Here are the scripts as an attachment.
On 8/2/20 9:58 PM, Merlencer wrote:
I'm working on a project. We have thousands of channels, so the amount of data is very large, and the data needs to be kept all the time. What I want to do is to reduce the hard disk space occupied by data storage and ensure the accuracy of the data. Since I have just been exposed to MDSPlus, it is uncertain whether the accuracy of the compressed data has changed. So ask you about these questions. I'm glad you can give me these scripts. I need them . Thank you.
At 2020-07-31 20:32:51, "Josh Stillerman" [email protected] wrote:
We always operate with compress-on-put turned on for almost all nodes. The compress step does a further cleaning out of records that have be replaced by multiple writes to the same node. So:
1 - we acquire data mostly (all?) compress-on-put into 'new-shots'. This is where they are accessed on the day of the experiment. We use SSDs to hold 'new-shots'
2 - we copy the files as is to 'archived-shots' and then save them to 'tape'. this is the copy of record
3 - we compress the new-shots into our on-line archive storage and do incremental backups of that, this is the live copy of the old data.
I can shared the scripts that we use for these purposes if you like, they would need to be modified to suit your setup.
-Josh
On 7/30/20 9:09 PM, Merlencer wrote:
Thank you for your patience in explaining these problems to me.
At 2020-07-30 14:34:53, "Timo Schroeder" [email protected] wrote:
As far as the current api goes. you would have to manually read the compressed data a store it w/o compresion in a temp tree. but i think what josh sugested is. you store uncompressed during the shot. after the shot you create a copy into an archive location and compress it there. the uncpmpressed original you keep until the phase of high demand is over. the treepath would have the locations in order e.g.: /trees/new/~;/trees/model/~t;/trees/archive/~t;
this would solve it for new data at least.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/MDSplus/mdsplus/issues/2088#issuecomment-666853709, or unsubscribe
https://github.com/notifications/unsubscribe-auth/ABY5AZO3RUBDG3TJCNJU4P3R6IKTDANCNFSM4PLDO5MA.
-- Joshua Stillerman Research Engineer MIT Plasma Science and Fusion Center 617.253.8176 [email protected] mailto:[email protected]
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/MDSplus/mdsplus/issues/2088#issuecomment-667763770, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABY5AZMQ27VQZDYRK3MYHVTR6YKVLANCNFSM4PLDO5MA.
-- Joshua Stillerman Research Engineer MIT Plasma Science and Fusion Center 617.253.8176 [email protected] mailto:[email protected]
Our current database is being improved. At present, most of the data are int16 type, and then converted to float 32 type through AD conversion, but there are still some float 32 data. I would like to confirm whether the accuracy of this part of float32 data will be lost after compression.
At 2020-08-03 19:12:12, "Timo Schroeder" [email protected] wrote:
The built-in (default) compression algorithm is lossless. you will get the original data. however it is only effective for integer data upto 32bit (or strings). the compression chunk size it a maximum of 32 bits and bases on the int diff ((int*)x)[1:] - ((int*)x)[:-1]. since int64 is too large and floating point have a different structure. you may not get a lot out of it. If you are in control of the data storage. make sure you split the data into raw e.g.int16 and a scaling term ( setSegmentScaling() or MAKE_SIGNAL($VAULE*_SLOPE+ _OFFSET, _RAW, + _DIM).
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.
Is it that I don't know how to use GitHub or email, I can't find the attachment......
At 2020-08-03 20:18:40, "Josh Stillerman" [email protected] wrote:
The compression is doing automatically and is lossless. It uses delta value compression. A value followed by small bit-length deltas from that value until a value is encountered that can not be represented by a delta, then a flag, and a full bit width value, followed again by a series of deltas.
Here are the scripts as an attachment.
On 8/2/20 9:58 PM, Merlencer wrote:
I'm working on a project. We have thousands of channels, so the amount of data is very large, and the data needs to be kept all the time. What I want to do is to reduce the hard disk space occupied by data storage and ensure the accuracy of the data. Since I have just been exposed to MDSPlus, it is uncertain whether the accuracy of the compressed data has changed. So ask you about these questions. I'm glad you can give me these scripts. I need them . Thank you.
At 2020-07-31 20:32:51, "Josh Stillerman" [email protected] wrote:
We always operate with compress-on-put turned on for almost all nodes. The compress step does a further cleaning out of records that have be replaced by multiple writes to the same node. So:
1 - we acquire data mostly (all?) compress-on-put into 'new-shots'. This is where they are accessed on the day of the experiment. We use SSDs to hold 'new-shots'
2 - we copy the files as is to 'archived-shots' and then save them to 'tape'. this is the copy of record
3 - we compress the new-shots into our on-line archive storage and do incremental backups of that, this is the live copy of the old data.
I can shared the scripts that we use for these purposes if you like, they would need to be modified to suit your setup.
-Josh
On 7/30/20 9:09 PM, Merlencer wrote:
Thank you for your patience in explaining these problems to me.
At 2020-07-30 14:34:53, "Timo Schroeder" [email protected] wrote:
As far as the current api goes. you would have to manually read the compressed data a store it w/o compresion in a temp tree. but i think what josh sugested is. you store uncompressed during the shot. after the shot you create a copy into an archive location and compress it there. the uncpmpressed original you keep until the phase of high demand is over. the treepath would have the locations in order e.g.: /trees/new/~;/trees/model/~t;/trees/archive/~t;
this would solve it for new data at least.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/MDSplus/mdsplus/issues/2088#issuecomment-666853709, or unsubscribe
https://github.com/notifications/unsubscribe-auth/ABY5AZO3RUBDG3TJCNJU4P3R6IKTDANCNFSM4PLDO5MA.
-- Joshua Stillerman Research Engineer MIT Plasma Science and Fusion Center 617.253.8176 [email protected] mailto:[email protected]
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/MDSplus/mdsplus/issues/2088#issuecomment-667763770, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABY5AZMQ27VQZDYRK3MYHVTR6YKVLANCNFSM4PLDO5MA.
-- Joshua Stillerman Research Engineer MIT Plasma Science and Fusion Center 617.253.8176 [email protected] mailto:[email protected]
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.
github discards attachments josh will have to send to your email.
@joshStillerman : we could add the scipts to the repo under ./scripts/shot-cycle or similar
its always lossless but in some cases not smaller and the compressed version is discarded in favour of the original data. the ratio of compressed vs uncompressed is reflected in the RLENGTH vs. LENGTH nci.
GETNCI(PATH:TO:NODE, "RLENGTH")
GETNCI(PATH:TO:NODE, "LENGTH")
rlength vs length field of the TreeNode in python
Thank you for answering so many questions for me.
At 2020-08-04 15:30:20, "Timo Schroeder" [email protected] wrote:
its always lossless but in some cases not smaller and the compressed version is discarded in favour of the original data. the ratio of compressed vs uncompressed is reflected in the RLENGTH vs. LENGTH nci.
GETNCI(PATH:TO:NODE, "RLENGTH") GETNCI(PATH:TO:NODE, "LENGTH")
rlength vs length field of the TreeNode in python
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.
I also want to ask a question. At present, I have built a new tree (actually a folder) named my_ tree。 Since we have done more than 10000 experiments, there are tens of thousands of files in this folder. Obviously, these files are not easy to manage. How can we manage these tens of thousands of files (and even more in the future) more effectively, and also through my_ Trees access the data in these files.
At 2020-08-04 15:30:20, "Timo Schroeder" [email protected] wrote:
its always lossless but in some cases not smaller and the compressed version is discarded in favour of the original data. the ratio of compressed vs uncompressed is reflected in the RLENGTH vs. LENGTH nci.
GETNCI(PATH:TO:NODE, "RLENGTH") GETNCI(PATH:TO:NODE, "LENGTH")
rlength vs length field of the TreeNode in python
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.
sorry i was sure i answered that via email. please check the official documentation https://mdsplus.org/index.php?title=Documentation:TreeAccess in particular the ~a-j and ~t syntax.