gtkwave icon indicating copy to clipboard operation
gtkwave copied to clipboard

RFC: New savefile format

Open rfuest opened this issue 2 years ago • 7 comments

The current savefile format isn't easy to read or to generate by an external tool. I would like to introduce a new file format for GTKWave 4. At the moment I'm planning to use JSON to store the save file. While I'm not a huge fan of JSON myself it has the advantage that it is very common and can easily be used in many programming languages or modified by tools like jq. And with a modern editor with JSON support it isn't too bad to edit by hand.

In a previous discussion about a new savefile format TCL has been suggested as a new file format, but I prefer a serialization format like JSON for two reasons. The first are possible security implications of using a scripting language as a savefile format. While it is unlikely that this would be exploited, I just don't like the idea of allowing arbitrary code execution when opening a file. The second reason is that it wouldn't be possible to preserve custom code in the TCL file if a savefile is overwritten by GTKWave. If, for example, a user has added signals in a loop inside the TCL file and then overwrites the file using "Save" in the UI any custom code would be gone. Using TCL as a savefile has it's use cases, like https://github.com/gtkwave/gtkwave/pull/114, but I think it shouldn't be the primary savefile format and instead be a secondary format that can be opened read only and exported.

JSON Example

This example shows a rough idea of what I have in mind for a JSON format.

grafik

Current format

[*]
[*] GTKWave Analyzer v3.3.115 (w)1999-2023 BSI
[*] Thu Jun 15 17:32:40 2023
[*]
[dumpfile] "/home/ralf/git/gtkwave/gtkwave/gtkwave3-gtk3/examples/des.fst"
[dumpfile_mtime] "Mon Oct 11 15:44:12 2021"
[dumpfile_size] 158102
[savefile] "/tmp/old.gtkw"
[timestart] 0
[size] 1000 600
[pos] 1881 387
*-3.864544 5 8 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1
[markername] AMarker
[treeopen] top.
[treeopen] top.des.
[sst_width] 273
[signals_width] 94
[sst_expanded] 1
[sst_vpaned_height] 157
@29
top.clk
@22
top.ct[1:64]
@10420
top.i
@20000
-
@800200
-round1
@22
[color] 1
top.des.round1.b1x[1:6]
[color] 2
top.des.round1.b2x[1:6]
[color] 3
top.des.round1.b3x[1:6]
@1000200
-round1
[pattern_trace] 1
[pattern_trace] 0

New JSON based format

{
    "version": 1,
    "generator": "GTKWave 4.0.0",
    "dumpFile": "des.fst",
    "timeRange": {
        "start": 0,
        "end": 42
    },
    "primaryMarker": 5,
    "namedMarkers": {
        "Marker": 8
    },
    "traces": [
        "top.clk",
        "top.ct[1:64]",
        {
            "type": "analog",
            "name": "top.i",
            "analogMode": "interpolation",
            "height": 2
        },
        {
            "type": "group",
            "name": "round1",
            "children": [
                {
                    "name": "top.des.round1.b1x[1:6]",
                    "color": "red"
                },
                {
                    "name": "top.des.round1.b2x[1:6]",
                    "color": "orange"
                },
                {
                    "name": "top.des.round1.b3x[1:6]",
                    "color": "yellow"
                }
            ]
        }
    ]
}

Notes:

  • version and generator are used to handle future changes to the file format. version is the file format version, which will be incremented if the file format changes in the future. Older GTKWave version will still try to open a file with a higher version number but will display a warning that the file was created with a newer version and that it might not be displayed correctly.
  • dumpFile is using a relative path by default.
  • The save file no longer contains the date/time when it was last modified. IMO tracking of file versions should be handled by version control instead.
  • timeRange replaces the start time and zoom factor in the old format. It's much easier to specify two times than a logarithmic zoom factor.
  • namedMakers will be stored as a list. There won't be a fixed number of named markers anymore and only the used markers need to be specified in the savefile.
  • The are now stored as a tree instead of a list. A trace can have children, like the group in the example.
  • The numerical flags will be replaced by named fields in the trace objects. Like the color field in the example.
  • Analog height extensions will be replaced by a height field. The height is defined as a multiple of the base trace height.

The exact structure of the file isn't fixed yet, but this should give a good overview of how a JSON based savefile format could look like. Any feedback for improvements or alternatives is appreciated.

rfuest avatar Jun 15 '23 18:06 rfuest

That looks good, and...sane.  The existing savefile format went through a couple of development phases:

  1. single character delimiter followed by a payload for the command
  2. [whatever] formatted commands similar to rcfile formats going back a decade or two.3) complete and total stagnation

That said, there is no compelling reason it should be as it currently is, only that it evolved over time with a hump on its back, but the hump wasn't too terrible of a deformity so I never changed it "because it works."

The only tricky part with savefiles during loading is the need to do two passes over it in order to handle non-memory resident formats:

  1. mark each "facility" that will be loaded (pass 1)
  2. kick off loading on all marked facilities via callbacks or iteration (end of pass 1)
  3. insert formatted traces into the viewer (pass 2, or reading a caching of pass 1)

The justification for that is so we only have to do a single pass through the database and holds for formats such as FST, FSDB, etc.  For toy sized files, nobody notices, but for very large dumpfiles, you definitely need to be able to mark off signals for a bulk, single pass load.  This is easy to do via various methods, but we don't want to query the database one signal at a time. One thing to keep in mind is that if I remember correctly, cut-and-paste does some weird stuff either using an in-memory version of save files or some kind of Tcl trickery.  I don't remember the format used as this was an add by a developer from Concept Engineering around 2008-ish, but if you cut and paste from gtkwave signals pane into Emacs (seriously), you will see the format used as the cut-and-paste payload is plaintext ASCII.

-Tony

On Thursday, June 15, 2023 at 02:34:40 PM EDT, Ralf Fuest ***@***.***> wrote:  

The current savefile format isn't easy to read or to generate by an external tool. I would like to introduce a new file format for GTKWave 4. At the moment I'm planning to use JSON to store the save file. While I'm not a huge fan of JSON myself it has the advantage that it is very common and can easily be used in many programming languages or modified by tools like jq. And with a modern editor with JSON support it isn't too bad to edit by hand.

In a previous discussion about a new savefile format TCL has been suggested as a new file format, but I prefer a serialization format like JSON for two reasons. The first are possible security implications of using a scripting language as a savefile format. While it is unlikely that this would be exploited, I just don't like the idea of allowing arbitrary code execution when opening a file. The second reason is that it wouldn't be possible to preserve custom code in the TCL file if a savefile is overwritten by GTKWave. If, for example, a user has added signals in a loop inside the TCL file and then overwrites the file using "Save" in the UI any custom code would be gone. Using TCL as a savefile has it's use cases, like #114, but I think it shouldn't be the primary savefile format and instead be a secondary format that can be opened read only and exported.

JSON Example

This example shows a rough idea of what I have in mind for a JSON format.

Current format [] [] GTKWave Analyzer v3.3.115 (w)1999-2023 BSI [] Thu Jun 15 17:32:40 2023 [] [dumpfile] "/home/ralf/git/gtkwave/gtkwave/gtkwave3-gtk3/examples/des.fst" [dumpfile_mtime] "Mon Oct 11 15:44:12 2021" [dumpfile_size] 158102 [savefile] "/tmp/old.gtkw" [timestart] 0 [size] 1000 600 [pos] 1881 387 *-3.864544 5 8 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 [markername] AMarker [treeopen] top. [treeopen] top.des. [sst_width] 273 [signals_width] 94 [sst_expanded] 1 [sst_vpaned_height] 157 @29 top.clk @22 top.ct[1:64] @10420 top.i @20000

@800200 -round1 @22 [color] 1 top.des.round1.b1x[1:6] [color] 2 top.des.round1.b2x[1:6] [color] 3 top.des.round1.b3x[1:6] @1000200 -round1 [pattern_trace] 1 [pattern_trace] 0

New JSON based format { "version": 1, "generator": "GTKWave 4.0.0", "dumpFile": "des.fst", "timeRange": { "start": 0, "end": 42 }, "primaryMarker": 5, "namedMarkers": { "Marker": 8 }, "traces": [ "top.clk", "top.ct[1:64]", { "type": "analog", "name": "top.i", "analogMode": "interpolation", "height": 2 }, { "type": "group", "name": "round1", "children": [ { "name": "top.des.round1.b1x[1:6]", "color": "red" }, { "name": "top.des.round1.b2x[1:6]", "color": "orange" }, { "name": "top.des.round1.b3x[1:6]", "color": "yellow" } ] } ] } Notes:

  • version and generator are used to handle future changes to the file format.
    version is the file format version, which will be incremented if the file format changes in the future. Older GTKWave version will still try to open a file with a higher version number but will display a warning that the file was created with a newer version and that it might not be displayed correctly.
  • dumpFile is using a relative path by default.
  • The save file no longer contains the date/time when it was last modified. IMO tracking of file versions should be handled by version control instead.
  • timeRange replaces the start time and zoom factor in the old format. It's much easier to specify two times than a logarithmic zoom factor.
  • namedMakers will be stored as a list. There won't be a fixed number of named markers anymore and only the used markers need to be specified in the savefile.
  • The are now stored as a tree instead of a list. A trace can have children, like the group in the example.
  • The numerical flags will be replaced by named fields in the trace objects. Like the color field in the example.
  • Analog height extensions will be replaced by a height field. The height is defined as a multiple of the base trace height.

The exact structure of the file isn't fixed yet, but this should give a good overview of how a JSON based savefile format could look like. Any feedback for improvements or alternatives is appreciated.

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you are subscribed to this thread.Message ID: @.***>

gtkwave avatar Jun 15 '23 21:06 gtkwave

The only tricky part with savefiles during loading is the need to do two passes over it in order to handle non-memory resident formats: 1) mark each "facility" that will be loaded (pass 1) 2) kick off loading on all marked facilities via callbacks or iteration (end of pass 1) 3) insert formatted traces into the viewer (pass 2, or reading a caching of pass 1)

My plan is to read the entire savefile first before opening the corresponding dump file. At that point it should be easy to use methods like fst_import_masked to import all traces at once.

One thing to keep in mind is that if I remember correctly, cut-and-paste does some weird stuff either using an in-memory version of save files or some kind of Tcl trickery. I don't remember the format used as this was an add by a developer from Concept Engineering around 2008-ish, but if you cut and paste from gtkwave signals pane into Emacs (seriously), you will see the format used as the cut-and-paste payload is plaintext ASCII.

I did already encounter this and don't plan too keep it. If I remember correctly it is primarily used for copy and paste between different instances of GTKWave. The code to generate and parse the pseudo Tcl is very complicated and at least in the past it was easy to segfault GTKWave by pasting some random text, which GTKWave then tried to interpret as this pseudo Tcl. I did fix some of this issues but I'm not sure if the fixes made their way into the GIT repo.

I'm planning to overhaul the entire DnD and clipboard operations as part of the move from a linear list of traces to a tree structure. The current GLOBALS->traces.buffer will be replaced by a custom GObejct, which can be transferred between instances of GTKWave without manually serializing/deserializing it. This also allows multiple cut/past buffers to exist at the same time, which removes the need for backing up the global trace buffer into a TempBuffer for some operations.

rfuest avatar Jun 16 '23 14:06 rfuest

For gtkwave, primarily that is used for cut-and-paste between instances of gtkwave, and also (I think) between a lockstep twinwave -> gtkwave session unless there is different signaling of hierarchy vs signal used for that one.  I can't remember what I did for that and I may have shortcircuited the convention to something much simpler.

The original intended usage of the Tcl-like string passing was for Concept Engineering's own tool that is more-or-less a Verdi-like environment.  I think it was an earlier version of RTLvision Pro.  They forked off the gtkwave codebase long ago and it appears that since then they've written their own viewer, so modifications should have no downstream impact.

-Tony

On Friday, June 16, 2023 at 10:27:15 AM EDT, Ralf Fuest ***@***.***> wrote:  

The only tricky part with savefiles during loading is the need to do two passes over it in order to handle non-memory resident formats: 1) mark each "facility" that will be loaded (pass 1) 2) kick off loading on all marked facilities via callbacks or iteration (end of pass 1) 3) insert formatted traces into the viewer (pass 2, or reading a caching of pass 1)

My plan is to read the entire savefile first before opening the corresponding dump file. At that point it should be easy to use methods like fst_import_masked to import all traces at once.

One thing to keep in mind is that if I remember correctly, cut-and-paste does some weird stuff either using an in-memory version of save files or some kind of Tcl trickery. I don't remember the format used as this was an add by a developer from Concept Engineering around 2008-ish, but if you cut and paste from gtkwave signals pane into Emacs (seriously), you will see the format used as the cut-and-paste payload is plaintext ASCII.

I did already encounter this and don't plan too keep it. If I remember correctly it is primarily used for copy and paste between different instances of GTKWave. The code to generate and parse the pseudo Tcl is very complicated and at least in the past it was easy to segfault GTKWave by pasting some random text, which GTKWave then tried to interpret as this pseudo Tcl. I did fix some of this issues but I'm not sure if the fixes made their way into the GIT repo.

I'm planning to overhaul the entire DnD and clipboard operations as part of the move from a linear list of traces to a tree structure. The current GLOBALS->traces.buffer will be replaced by a custom GObejct, which can be transferred between instances of GTKWave without manually serializing/deserializing it. This also allows multiple cut/past buffers to exist at the same time, which removes the need for backing up the global trace buffer into a TempBuffer for some operations.

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.***>

gtkwave avatar Jun 16 '23 18:06 gtkwave

For gtkwave, primarily that is used for cut-and-paste between instances of gtkwave, and also (I think) between a lockstep twinwave -> gtkwave session unless there is different signaling of hierarchy vs signal used for that one. I can't remember what I did for that and I may have shortcircuited the convention to something much simpler.

I would like to remove twinwave for GTKWave 4 and instead include similar functionality in GTKWave itself. The refactored backend no longer uses any global variables, which makes it possible to open multiple dump files without any tricks like swapping the GLOBALS struct. This should make it easy to display different files in a vertically split window.

rfuest avatar Jun 16 '23 18:06 rfuest

That sounds good.  Some of it split across two processes was an artifact of the 32-bit days.  When I was working on multiprocessor traces from actual IBM designs, those designs could easily chew up 3GB on waves, let alone all the other stuff, so any source code annotation was more easily done in another process.  I have no idea if the IBM AE2 interface code is even usable anymore so that loader probably can be removed.  It's been years since I've gotten any emails about it.  My guess is they may have migrated to FSDB by now since all the main developers I knew have retired or left.

-Tony

On Friday, June 16, 2023 at 02:26:20 PM EDT, Ralf Fuest ***@***.***> wrote:  

For gtkwave, primarily that is used for cut-and-paste between instances of gtkwave, and also (I think) between a lockstep twinwave -> gtkwave session unless there is different signaling of hierarchy vs signal used for that one. I can't remember what I did for that and I may have shortcircuited the convention to something much simpler.

I would like to remove twinwave for GTKWave 4 and instead include similar functionality in GTKWave itself. The refactored backend no longer uses any global variables, which makes it possible to open multiple dump files without any tricks like swapping the GLOBALS struct. This should make it easy to display different files in a vertically split window.

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.***>

gtkwave avatar Jun 16 '23 21:06 gtkwave

Some of it split across two processes was an artifact of the 32-bit days. When I was working on multiprocessor traces from actual IBM designs, those designs could easily chew up 3GB on waves, let alone all the other stuff, so any source code annotation was more easily done in another process.

I'm glad that this isn't necessary anymore. I've also made the assumption that GTKWave 4 will primarily be run on 64 bit platforms and changed the value union in the HistEnt struct to always contain the double h_double variant (WAVE_HAS_H_DOUBLE is always defined). This removes a lot of ifdefs in the code and I don't think anyone with large designs, where memory consumption is an issue, still uses a 32 bit system.

I have no idea if the IBM AE2 interface code is even usable anymore so that loader probably can be removed. It's been years since I've gotten any emails about it. My guess is they may have migrated to FSDB by now since all the main developers I knew have retired or left.

At the moment I've removed some formats from the refactored code and AE2 is one of them. It wasn't possible to keep AE2 and FSDB working without being able to test the code, because I don't have access to the required libraries. Instead of keeping code with a high probability of being broken I did decide to remove them and let someone with access to the libraries work on restoring the format support later. Adding new formats will be easier in the refactored code, because there is now an internal API to add new filetypes. Later this API could be used to allow filetype plugins, so that these more specialized formats could also be maintained externally and independent of the main GTKWave codebase.

LXT 1 support is another missing format, because it was a pain to refactor the code and I didn't want to spend a lot of time on a format I don't use. The best way to restore LXT 1 support would be to first extract the LXT parsing into a library, like for the other formats. The mixed parsing and GTKWave code is very hard to understand or modify.

rfuest avatar Jun 17 '23 14:06 rfuest

Using a standard format such as JSON, YML, etc. will make everyone understand the syntax. That is a good change.

By the way, is there a definition -outside the source code itself- of what is saved in these files? It looks like some UI aspects, such as the time scale conversion gets lost in the process.

jotego avatar Oct 03 '23 05:10 jotego