16 Huge memory model is non-optimal, suggest memory model change to large 16-bit

trafficstars

Just a heads up, the huge memory model has the benefit of having the compiler adjust pointers such that pointer math is easier, but, also causes performance loss and overhead in your code.

The large model will give you all the benefits of far code and data without the overhead of the compiler and C library's pointer adjustment routines.

You will have to normalize pointers yourself and deal with crossing 64KB segments for large data, but it's worth it.

Also, the compiled binaries for Large model are significantly smaller than Huge model binaries.

Dec 12 '16 11:12 joncampbell123

ah ok thanks

Dec 15 '16 17:12 sparky4

ah large mode breaks scroll's ability to draw chikyuu properly hmmmm

Dec 17 '16 23:12 sparky4

Take into account that if you use arrays longer then one segment you need huge memory model (it handles pointers over segment boundary) or you must handle this case yourself.

Dec 18 '16 07:12 jmalak

The VRS rendering code assumes that the offset portion of the pointer is such that rendering a scanline never crosses 64KB. Try normalizing the pointer yourself before rendering.

Basic (slow) example code:

unsigned long a = (FP_SEG(ptr) << 4UL) + (unsigned long)FP_OFF(ptr); ptr = MK_FP(a>>4UL,a&0x0FUL);

Is there anything in your code that uses an array that exceeds 64KB? You might shrink the array, or set up the array in segments so that no part of it exceeds 64KB.

Dec 18 '16 07:12 joncampbell123

You may also consider dynamically allocating the array segments as well.

Dec 18 '16 07:12 joncampbell123

Might I recommend a solution? Switch the arrays out for matrix that use page alignment for lines.

When X and Y are the address, X is the pointer, Y is the segment designator.

For simplified code: Data Segment = Base+(Y*Pagesize) Data Pointer = X

If you REALLY need a liniar array, you can then virtualize this into pages by: X = L Modulus 4K (Utilizing truncation of high bits) Y = Integer Truncation of L/4K (Utilizing truncation of low bits followed by shifting high bits low)

This of course uses 4K segments, you can use any increment of 4K, but I recommend keeping to powers of two for convenience of simplification into binary operations so arithmetic operations aren't needed.

Dec 18 '16 07:12 Ruedii

With OW you can use large memory model (more eficient) and appropriate variable/pointers can be marked as __huge to use right aritmetic (slow) only for these variables/pointers.

Dec 18 '16 09:12 jmalak

well holy shit it is SIGNIFICANTLY FASTER GOD DAMN!

Dec 19 '16 18:12 sparky4

Here's hoping you're not joking or being sarcastic. Good luck :)

Dec 19 '16 18:12 joncampbell123

The method I mentioned is very clean, it's an old method for fixed segment databases.

Jan 04 '17 19:01 Ruedii

@Ruedii wolf 3d dose a pre calculation of the render off set in an array ..

Jun 05 '18 15:06 sparky4

Sorry for the late reply, got lost in my master in pile, and my life has been quite busy as well, mostly family issues for the past year.

The x and y as variables are confusing. I should have used Ps and Po for "Pointer Segment" and "Pointer Offset" Of course, in rendering you can convert this as directly as possible by using certain block sizes.

It might be good to dynamically assign segment block size based on platform if it's capable. Each Intel generation buffers in larger and larger segments. In the 486 an optional cache gets added. The L1/L0 data cache slowly grows with generations and can usually be read on all CPUs 486 and later.

Further assembler optimizations of memory access of arrays would be done in the C Library itself, or supplementary math libraries you can add. Arrays simply add one more multiplier in (component object size) to determine their pointer size. As long as the component object is smaller than the preferred memory page size you can handle cross pages.

Since you aren't protecting pages individually, cross-page access with the offset when accessing an object crossing into the next page via a offset from the last page should only create a small performance loss, that should be a non-issue. However, calculating your object size to be a common denominator to the page size should prevent this issue altogether hands-off. This may mean adding a bit of padding that you can use for some nice added metadata flags or something.

If you wish to add streaming basic math and copy routines. The proper way to handle it is to run the stream-handler loop so that the pointers for the next data piece are assigned to the data pointer register immediately after the data is pulled to the register, before computing the data in the register. This will allow the full time of doing operations on the data in the register to provide time for the memory to have it's registers flipped. This will particularly help on 486 and later processors that have a (very small) internal L0/L1 data buffer or cache.

Apr 04 '19 18:04 Ruedii

i honestly been swamped with school work constantly and lack of help made me not work on it in general. lots have been going on with my mental health but i am getting more stable. i aint dead nor i forgot the project. i just been super busy with school thats all. the biggest problem is i don't have the XT sitting around as it is in storage to work on the game some more. so i cannot really test it on authentic hardware except a 286 i will continue once my life is better and not grinding away at school.

Mar 17 '22 00:03 sparky4

16 16 copied to clipboard

Huge memory model is non-optimal, suggest memory model change to large 16-bit

16
16 copied to clipboard