OpenROAD Reduce compressed size of files in results/ folder, especially .odb files

Description

When working with large, long running designs, .odb compressed file sizes start to matter.

Especially when creating artifacts that are uploaded/downloaded to artifact servers, such as Bazel.

I take it as read that whatever artifact system that is being used has compression built in, so it is not the raw .odb file size, but the compressed .odb file size that needs to be reduced.

As far as this feature request is concerned, the file size on disk is not a major concern, upload/download speeds is.

Additional Context

No response

Nov 09 '23 19:11 oharboe

We could theoretically add a gz option to the writer.

Nov 10 '23 05:11 QuantamHD

We could theoretically add a gz option to the writer.

Surely Bazel , or whatever artifact system is being used, has compression already....

Nov 10 '23 06:11 oharboe

Looks like not by default. https://bazel.build/reference/command-line-reference#flag--remote_cache_compression, and there's zero chance that CMake and or make are doing any kind of artifact compression.

https://www.buildbuddy.io/blog/how-bazel-5-0-makes-your-builds-faster/

Nov 10 '23 06:11 QuantamHD

Google Cloud Buckets can be configured to do compression, so this might not even be the job of the Bazel.

Certainly before changing ORFS/OpenROAD, we should try to find out if this problem is solved elsewhere for important use-cases.

A long time ago, I tried to reduce disk space for ORFS by using symbolic links instead of copying .odb files, but make doesn't play nice with symbolic links.

For my wafer thin Bazel layer on top of ORFS, I only keep one .odb file per floorplan, place, cts, route, final stages.

Nov 10 '23 06:11 oharboe

My understand of that that feature is that it reduces at rest storage costs. GCS decompresses it on the fly over the wire so you're paying the full network bandwidth cost. https://cloud.google.com/storage/docs/transcoding

What bottle neck are you actually running into?

Are your builds slow because remote artifact downloads are slow?
Are you running into high storage prices?
Is bazel failing to download some large file?

More details would help narrow the solution space. There's not a lot of easy size optimizations we could make in odb beyond adding compression.

Nov 10 '23 06:11 QuantamHD

I read a bit about Bazel. It supports compression, but also de-duplication. So if .odb files were reworked to work well the deduplication & built in Bazel compression there is a significant savings potential.

Requires significant digging to figure out what this means... Split into multiple files so that shared data between .odb files can be detected by artifact systems to be duplicate?

git too has an artifact system, which compresses by default.

Nov 10 '23 06:11 oharboe

I think the reality is that would be a lot of work and complexity for gains on very specific systems. I don't think the juice is worth the squeeze.

What are you primarily trying to optimize? Storage cost or fetch speed?

I would also advise against storing odb files or other artifact like things in git. There's not a great solution for long term caching of these type of files, but putting them in git will forever store them in your history, and will quickly run you into platform repo limits on git hosting websites.

Nov 10 '23 06:11 QuantamHD

I take as a default I assume that it is unecessary to add compression to ORFS and that the job of ORFS is to enable existing compression(whereever it comes from) to be able to work effectively.

Regarding google cloud: gsutil has a -Z and a -J compression option. The first is to compress on the server, the second to compress on the client (over the wire). W.r.t. ORFS, I assume that Bazel has a way to exploit this.

Nov 10 '23 06:11 oharboe

Yeah I think bazel is pretty agnostic to storage. I think the only way you get compression is with that flag I mentioned previously. Also my proposal is not to add compression to ORFS, but inside openroad so if you say run

write_db mydb.odb.gz it will be streamed into a gzip. That would let you interact with pretty much any build system.

Nov 10 '23 06:11 QuantamHD

I think the reality is that would be a lot of work and complexity for gains on very specific systems. I don't think the juice is worth the squeeze.

What are you primarily trying to optimize? Storage cost or fetch speed?

I would also advise against storing odb files or other artifact like things in git. There's not a great solution for long term caching of these type of files, but putting them in git will forever store them in your history, and will quickly run you into platform repo limits on git hosting websites.

I don't propose to store .odb files in git. I was just using it as an example that one can expect artifact systems to have some sort of built in compression for transmission over the wire.

My main concern is transmission speeds.

Storage costs I would mainly manage with pruning old builds.

Nov 10 '23 06:11 oharboe

Yeah I think bazel is pretty agnostic to storage. I think the only way you get compression is with that flag I mentioned previously.

Nice find! I will experiment with it, it can be added to the .bazelrc under version control in my project alongside WORKSPACE.bazel

Also my proposal is not to add compression to ORFS, but inside openroad so if you say run

write_db mydb.odb.gz it will be streamed into a gzip. That would let you interact with pretty much any build system.

Perhaps someone will benefit from write_db mydb.odb.gz, but it doesn't address my concern in this feature request, so I consider it a separate feature request.

Nov 10 '23 06:11 oharboe

OR already has support for GZip on LEF, DEF, and SPEF. I don't think it would be too much to ask for ODB as well.

Nov 10 '23 17:11 rovinski

OR already has support for GZip on LEF, DEF, and SPEF. I don't think it would be too much to ask for ODB as well.

Sure, but not relevant to this feature request. This feature request is about reducing the compressed size. Compression happens outside of OpenROAD already.

Nov 10 '23 17:11 oharboe

Ah I did not read the edited original post. I mean, I don't really know what to do there, other than turning up the compression effort on GZip which of course has computation time implications.

Nov 10 '23 18:11 rovinski

Ah I did not read the edited original post. I mean, I don't really know what to do there, other than turning up the compression effort on GZip which of course has computation time implications.

There was some talk to have a scheme to store repeating structures, introduced in floorplanning, filling, etc. more efficiently.

Nov 10 '23 18:11 oharboe

A first step would be to have some reporting of the size of the various sections of the file. My guess is detailed routing will dominate but it would be good to verify that. PDN is another possibility.

Nov 11 '23 00:11 maliberty

I did some experiments we can reduce the odb size by ~50% if we move to a VLQ encoding for uint32_t in dbStream.h.

see

void dbOStream::writeUnsignedVlq(uint32_t c)
{
  if (c == 0) {
    _f.put(0);
    return;
  }

  // Octets store 7 signal bits and 1
  // continuation bit in the MSB slot.
  char mast_7bit = 0b1111111;
  char vlq = 0;
  while (c != 0) {
    char vlq = c & mast_7bit;
    c >>= 7;
    if (c != 0) {
      vlq |= 0b10000000;
    }
    _f << vlq;
  }
}

On Jpeg sky130 the routed odb goes from 360Mb to 180MB with VLQs. If you use ZSTD compression on the file it shrinks to about 37MB,

Nov 13 '23 07:11 QuantamHD

On Jpeg sky130 the routed odb goes from 360Mb to 180MB. ZSTD compression is about 37MB,

Nice!

What was the difference in conmpressed size?

Nov 13 '23 07:11 oharboe

1-2MB

Nov 13 '23 07:11 QuantamHD

1-2MB

I see...

Not to be a party pooper, but then it looks like it is better to just leave this to generic compression than to add make OpenROAD code more complicated?

Unless this is simple and fast, in which case it is a win for the uncompressed case, i.e. normal ORFS flow.

Nov 13 '23 07:11 oharboe

Yeah I think adding a streaming zstd encoder by default makes the most sense. It's already in boost so it'll be an easy add. It's pretty fast 500MB+ encode speed and 2500MB/s read

Nov 13 '23 07:11 QuantamHD

Yeah I think adding a streaming zstd encoder by default makes the most sense. It's already in boost so it'll be an easy add. It's pretty fast 500MB+ encode speed and 2500MB/s read

I see. Point of order to be nit-picking: this feature request is about reducing compressed size.

Nov 13 '23 07:11 oharboe

I looked at bit over the weekend and the most obvious place is the dbGCellGrid could be more efficient. Its hard to predict how much that will affect compressed size though as the compression might already be getting those gains.

Nov 13 '23 16:11 maliberty

Maybe there could be a compressed structure for arrays of fill cells? Something that just stores {x_origin, y_origin, x_pitch, y_pitch, x_count, y_count} plus some way of incrementing the instance name.

Nov 13 '23 19:11 rovinski

Maybe there could be a compressed structure for arrays of fill cells? Something that just stores {x_origin, y_origin, x_pitch, y_pitch, x_count, y_count} plus some way of incrementing the instance name.

Would that be better than zstd can do now?

Nov 13 '23 20:11 oharboe

Only way to find out is to try 🤷‍♂️ Any manually coded scheme should beat a dictionary-based encoder, but it's a question of how well.

Nov 14 '23 04:11 rovinski

Just as a datapoint, bsdiff is not practical, took hours.

$ bsdiff 3_place.odb 4_cts.odb patchfile
$ zip xx patchfile 
updating: patchfile (deflated 0%)
$ unzip -lv xx.zip 
Archive:  xx.zip
 Length   Method    Size  Cmpr    Date    Time   CRC-32   Name
--------  ------  ------- ---- ---------- ----- --------  ----
1820437880  Defl:N 203521160  89% 2023-12-31 15:14 d9f94a40  3_place.odb
1845729152  Defl:N 209114483  89% 2023-12-29 02:07 9859b31c  4_cts.odb
77743457  Defl:N 77694175   0% 2023-12-31 20:47 ffdfc498  patchfile
--------          -------  ---                            -------
3743910489         490329818  87%                            3 files

Diff in size for 3_place and 4_cts is 1845729152-1820437880=25291272 bytes=25mBytes

bsdiff is 77743457 = 77mBytes

the compressed size of 4_cts is ca. 185mByte.

Preliminary conclusion: bsdiff can reduce compressed size for the 4_cts by 2x.

Given the disadvantages and complications of a binary diff approach, this isn't particularly promising.

Jan 01 '24 14:01 oharboe

@maliberty An idea for macro placement specifically: write out a placemen.tcl, which fully describes the result of the .odb file from macro placement and read it back in in the next step. This completely eliminates the need for an .odb file for macro placement and also makes it easier to see what is going on in the macro placement stage as the placement.tcl doubles as a report.

This idea alone isn't particularly exciting unless it is more broadly applicable though. Are there other stages that can similarly be described by a small .tcl file?

Jan 11 '24 16:01 oharboe

Not really. Even macro placement currently assigns std cell locations as well so it wouldn't be that small.

Jan 11 '24 16:01 maliberty

@maliberty Close?

I think we covered the difference in significance between size of compressed and uncompressed .odb files and that this is now well understood. Uncompressed size helps runtime, compressed size helps network speed, disk size isn't particularly important.

There is no specific action or idea here, compressed size can always be improved...

Mar 06 '24 06:03 oharboe

Reduce compressed size of files in results/ folder, especially .odb files

Description

Suggested Solution

Additional Context