ChezScheme icon indicating copy to clipboard operation
ChezScheme copied to clipboard

`compute-size` should probably not include the size of RTDs

Open dpk opened this issue 4 months ago • 3 comments

The result of compute-size appears to include the size of RTDs for user-defined records, but not the sizes of the record types that are used to implement the types which come with Chez Scheme. This is at least confusing to the implementers of alternative data structure implementations, and at worst unfair since it may give users a misleading impression that these alternative implementations are significantly worse in terms of the memory usage of each instance than the Chez Scheme types, even if the memory usage is actually better.

A concrete example: I was curious how the memory usage of my sample implementation of SRFI 250 stacked up against the standard Chez Scheme R6RS hashtable implementation.

The code used to create the hash tables for comparison
(define srfi-250-alphabet-table
        (hash-table-unfold (lambda (c) (char>? c #\z))
                           (lambda (c) (values c (char-upcase c)))
                           (lambda (c) (integer->char (+ 1 (char->integer c))))
                           #\a
                           (make-comparator char? char=? #f (lambda (c) (number-hash (char->integer c))))
                           26))

(define r6rs-alphabet-table
    (let ((ht (make-hashtable (lambda (c) (number-hash (char->integer c))) char=? 26)))
      (let loop ((a 97))
        (if (> a 122)
            ht
            (begin
              (hashtable-set! ht (integer->char a) (char-upcase (integer->char a)))
              (loop (+ a 1)))))))
> (compute-size r6rs-alphabet-table)
1184
> (compute-size srfi-250-alphabet-table)
1006832

It really puzzled me why this small 26-entry hash table was taking up so much more memory in my implementation than in the Chez one: over a megabyte! Moreover, the inspector was not much help because I had made the hash-table type from SRFI 250 opaque, so I could not see the individual fields there to compute their respective sizes and see where the bloat was coming from.

After changing the definition to be transparent, I went back into the inspector and totted up the size results on each of the fields, all of which were small as I expected. Only then did it occur to me to check the size on the rtd: it weighs 1006240 bytes all on its own! Subtracting its weight (outside of Chez Scheme) gives the far more reasonable answer 592. Cool, my implementation is 50% more memory-efficient for this hash table! But it is a problem that my more memory-efficient implementation appears vastly more bloated, because the implementation of the built-in type ‘cheats’ by automatically not counting the size of the RTD.

I also think it is especially a problem, in cases like this, that there is no way to find out how much the RTD weighs according to compute-size if the record type is opaque. I think record types used as encapsulation for generic data structure implementations (rather than as domain-specific ‘plain old data objects’) should be opaque as a matter of course, since their implementation details should not bother the user of the data structure and they should not be tempted to extract information from it. (The Chez Scheme developers may disagree, since the built-in hash tables are not opaque!)

dpk avatar Aug 11 '25 22:08 dpk

The rtd (and code!) for the native hash tables is already in the static generation. You can have its size included by passing 'static as the second arg to compute-size. Or at least that reports a considerably larger size; I did not attempt to calculate what the total "should" be.

If you want a similar treatment of your own data structures, collect into the static generation after defining the record type before running your tests. (Or use a smaller generation and ask for one less in the compute-size call, as in the example given in CSUG for compute-size.)

jltaylor-us avatar Aug 12 '25 01:08 jltaylor-us

Thanks for the tip!

I nonetheless think that the default should be that RTDs don’t count towards the computed size of records.

dpk avatar Aug 12 '25 10:08 dpk

I nonetheless think that the default should be that RTDs don’t count towards the computed size of records.

To me it seems that RTDs are just one example of the broader caveat given in the introduction to CSUG § 3.6:

Objects sometimes point to a great deal more than one might expect. For example, if static data is included, the procedure value of (lambda (x) x) points indirectly to the exception handling subsystem (because of the argument-count check) and many other things as a result of that.

Treating RTDs as a special case would seem to only make the broader situation more confusing. To predictable exclude some data from counting, it seems necessary either to set things up with collect (and perhaps also set collect-maximum-generation) or to use compute-size-increments.

LiberalArtist avatar Aug 29 '25 04:08 LiberalArtist