Support for other code pages in text terminal?
I need to present the information in the text terminal in other language than English. Is there any support for the text terminal to display letters with diacritics? If not, I am going to implement it ASAP. I miss this feature a lot.
Can you give an example string with these diacritic symbols? Here is a similar issue with xterm.js: https://github.com/copy/v86/issues/927
I think the issue is that DisplayAdapter is not informed about the Codepage being used by the OS.
ScreenAdapter.put_char() is called with the character's byte code, so without knowing the active code page it's impossible to know which Unicode character to map to.
I have a hacked ScreenAdapter that lets me override the fixed codepage CP437 with CP850, it works for me however it wouldn't in general. I think it's some INT 21h that sets the code page, I don't know enought about it but I figure this would be the place where a bus event should be generated for ScreenAdapter to react upon.
EDIT: CP437 is hardcoded into ScreenAdapter here.
I think it's due to how v86 renders text.
Also, if someone needs, there are json files with charmap_high arrays for each encoding.
I think it's due to how v86 renders text.
Also, if someone needs, there are json files with charmap_high arrays for each encoding.
Hey @Pixelsuft, I was planning to look for exactly these arrays, so thanks!
Just two quick questions (I believe you're the author):
First, all code page mappings in cp367.json and cp65001.json are mapping to replacement character U+FFFD. Are they invalid?
Second, five code page mappings have less than 128 items:
- cp932.json (101 items)
- cp936.json (71)
- cp949.json (67)
- cp950.json (83)
- cp1361.json (76)
Are the missing items just 1:1 mappings to CP437?
Hello, sorry for the late answer. Exactly I was playing around with the hard coded code pages, substituting them with CP1250 and ISO-8859-2. But I have not been successful. It also depends on the configuration of the terminal of the embedded OS.
I will provide the Bash Script for generating all the codepage maps supported by Linux, if requested. But it is out of my expertize of what all needs to be configured in Linux OS to make it work. My distribution is ArchLinux32. I welcome any ideas for help.
I provide the script to generate the encodings using Bash in Linux.
Usage:
Usage:
generator.sh [encoding_name] - generate JSON output for given encoding
generator.sh STDIN - read encodings to generate from standard input
generator.sh - generate JSON output for all encodings - takes 45-60 min
generator.sh --help - show this help
Generator file is included as a attachment.
#!/bin/bash
###################################
# GENERATE ALL CODE PAGES TO JSON #
###################################
# Turn Exit of Error ON
set -e
# Output character using its code number
chr() {
printf "\\$(printf '%03o' "$1")"
}
# Convert the number with given codepage to Unicode Hex Code
toUnicode() {
set -o pipefail
CODE=$(chr "$1" | iconv --from-code="$2" --to-code=UTF16LE 2>/dev/null | hexdump | sed "2d" | awk '{ print "0x" $2 }')
if [ $? -ne 0 ] ; then
# Error - the given character is not possibly supported by the given encoding
echo -n "\"N/A\""
else
# Convert HEX code to DEC - JSON does not support numbers in HEX
printf "%d" $CODE
fi
set +o pipefail
}
# Check if the name of encoding is correct
isEncodingValid() {
local ENCODINGS=$(iconv -l | sed 's/\/\///')
local ENC=""
for ENC in $ENCODINGS ; do
if [ "$ENC" = "$1" ] ; then
# Encoding found, exit with success
return 0
fi
done
# Invalid encoding - Exit with Error
return 1
}
# First check the first argument if any encoding is given
if [ ! -z $1 ] ; then
if [ "$1" = "--help" ] ; then
echo "Usage:"
echo " generator.sh [encoding_name] - generate JSON output for given encoding "
echo " generator.sh STDIN - read encodings to generate from standard input"
echo " generator.sh - generate JSON output for all encodings - takes 45-60 min"
echo
echo " generator.sh --help - show this help"
echo
exit
fi
# if STDIN argument is provided, read the required encodings to generate from Standard Input
if [ "$1" = "STDIN" ] ; then
ENCODINGS=""
set +e
while read ENC ; do
if [[ $ENC =~ ^\s*$ ]] ; then # Ignore empty lines
continue
fi
isEncodingValid $ENC
if [ $? -ne 0 ] ; then
echo "Invalid encoding name $ENC" >>/dev/stderr
exit 1
fi
ENCODINGS="$ENCODINGS $ENC"
done
set -e
else
set +e
isEncodingValid $1
if [ $? -ne 0 ] ; then
echo "Invalid encoding name $1" >>/dev/stderr
exit 1
fi
set -e
ENCODINGS="$1"
fi
else
# Query all encodings supported by Linux
ENCODINGS=$(iconv -l | sed 's/\/\///')
fi
# Begin JSON object
echo "{"
# Iterate all encodings
FIRST_ITEM=1
for ENCODING in $ENCODINGS ; do
# If NOT the first item, write Comma Separator
if [ $FIRST_ITEM -eq 1 ] ; then
FIRST_ITEM=0
else
echo ","
fi
# Write the Encoding as the JSON object key
echo -n " \""
echo -n $ENCODING
echo -n "\""
# Begin CodePage Array
echo -n ": ["
# Iterate all characters
FIRST_CODE=1
for CODE in {0..255} ; do
if [ $FIRST_CODE -eq 1 ] ; then FIRST_CODE=0 ; else echo -n "," ; fi
UNICODE_CODE=$(toUnicode "$CODE" "$ENCODING")
echo -n $UNICODE_CODE
done
# End CodePage Array
echo -n "]"
done
# End JSON object
echo
echo "}"
I think it's due to how v86 renders text. Also, if someone needs, there are json files with charmap_high arrays for each encoding.
Hey @Pixelsuft, I was planning to look for exactly these arrays, so thanks!
Just two quick questions (I believe you're the author):
First, all code page mappings in cp367.json and cp65001.json are mapping to replacement character U+FFFD. Are they invalid?
Second, five code page mappings have less than 128 items:
- cp932.json (101 items)
- cp936.json (71)
- cp949.json (67)
- cp950.json (83)
- cp1361.json (76)
Are the missing items just 1:1 mappings to CP437?
I generated those JSON files a long time ago with python script and tested only some encodings that worked, so IDK
I think it's due to how v86 renders text. Also, if someone needs, there are json files with charmap_high arrays for each encoding.
Hey @Pixelsuft, I was planning to look for exactly these arrays, so thanks! Just two quick questions (I believe you're the author): First, all code page mappings in cp367.json and cp65001.json are mapping to replacement character U+FFFD. Are they invalid? Second, five code page mappings have less than 128 items:
- cp932.json (101 items)
- cp936.json (71)
- cp949.json (67)
- cp950.json (83)
- cp1361.json (76)
Are the missing items just 1:1 mappings to CP437?
I generated those JSON files a long time ago with python script and tested only some encodings that worked, so IDK
Try to validate it against my Bash script. It is using iconv tool so it should be valid. I do not say there are issues where it fails to convert a character for some reason. The value given then is a string "N/A" for such ASCII code.
@SuperMaxusa : Is there any possibility for the CopySH emulator to support apart from given codepages the full UTF-8 support?
Code point definitions for many PC Code Pages can be found at www.unicode.org:
- https://www.unicode.org/Public/MAPPINGS/
Specifically:
- https://www.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/PC/
- https://www.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WINDOWS/
Special care needs to be taken for several graphical characters that represent non-printable symbols, 0x01..0x1f and 0x7f (ESC) amongst them. Their mappings are defined here:
- https://www.unicode.org/Public/MAPPINGS/VENDORS/MISC/IBMGRAPH.TXT
I would recommend to use these.
Code point definitions for many PC Code Pages can be found at www.unicode.org:
- https://www.unicode.org/Public/MAPPINGS/
Specifically:
- https://www.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/PC/
- https://www.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WINDOWS/
Special care needs to be taken for several graphical characters that represent non-printable symbols, 0x01..0x1f and 0x7f (ESC) amongst them. Their mappings are defined here:
- https://www.unicode.org/Public/MAPPINGS/VENDORS/MISC/IBMGRAPH.TXT
I would recommend to use these.
Thank you for info. I did not know about this. I could "wget" it and parse using AWK to JSON.
Is there any possibility for the CopySH emulator to support apart from given codepages the full UTF-8 support?
If you about VGA Text Mode, probably no, because EGA and VGA are limited by CP437 support as @chschnell noticed:
EDIT: CP437 is hardcoded into ScreenAdapter here.
IIRC, in display drivers for MS-DOS (like a display.sys) loads custom font into VGA RAM but I'm not sure that works on v86: https://github.com/microsoft/MS-DOS/blob/2d04cacc5322951f187bb17e017c12920ac8ebe2/v4.0/src/DEV/DISPLAY/INT10COM.INC#L3-L35, https://wiki.osdev.org/VGA_Fonts#Set_VGA_fonts
Thank you for info. I did not know about this. I could "wget" it and parse using AWK to JSON.
In dosbox-x repository, you can find a tool to convert these files to unicode arrays: https://github.com/joncampbell123/dosbox-x/blob/master/contrib/mappings/db2u.pl
IIRC, in display drivers for MS-DOS (like a display.sys) loads custom font into VGA RAM but I'm not sure that works on v86: https://github.com/microsoft/MS-DOS/blob/2d04cacc5322951f187bb17e017c12920ac8ebe2/v4.0/src/DEV/DISPLAY/INT10COM.INC#L3-L35, https://wiki.osdev.org/VGA_Fonts#Set_VGA_fonts
Unlike other emulators which draw characters pixel by pixel, v86 just renders text mode using HTML, so it will propably be difficult to automaticly detect encoding from VGA RAM fonts.
@Pixelsuft Yes, I have noticed that it is not possible to support UTF-8 directly by reading the source code of vga.js. I also made a big confusion around the OS I use. I am not using MS-DOS but ArchLinux32. I need to get familiar of how to configure the terminal and console settings and how to remap the ASCII codes to Unicode in the ScreenAdapter. I will share the results immediately when I will be successful with this issue. Possibly I will prepare a PR.
Attached a ZIP containing the Codepage-to-Codepoint mappings in Javascript, along with the Python script I wrote to create it from the raw files from www.unicode.org.
Supported codepages:
CP437, CP737, CP775, CP850, CP852, CP855, CP857, CP860,
CP861, CP862, CP863, CP864, CP865, CP866, CP869, CP874,
CP1250, CP1251, CP1252, CP1253, CP1254, CP1255, CP1256,
CP1257, CP1258
Would it be possible to generate a bus event for DisplayAdapter whenever the OS changes its system codepage? I'd be willing to implement it, but I'm not sure where to start. codepage_converter.zip
Unlike other emulators which draw characters pixel by pixel, v86 just renders text mode using HTML, so it will propably be difficult to automaticly detect encoding from VGA RAM fonts.
Drawing vga fonts on a canvas would be fairly simply, but I like the fact that you can copy-paste from the vga screen.
Would it be possible to generate a bus event for DisplayAdapter whenever the OS changes its system codepage? I'd be willing to implement it, but I'm not sure where to start.
I'd accept a PR, but I don't know how the OS communicates to the vga controller which code page to use (and if it does at all). Alternatively, I'd also accept a PR that (optionally) renders the vga screen on a canvas, or sets the code page manually.
Well, I guess this is a good time then to present a little experiment I made which implements text screen on a canvas.
First, here's a test page without a running V86 instance in the background: Demo 1.
On the top left, first click on "Start", play around with the settings, then click on "Demo", test "Fullscreen". This demo is designed to cause permanent repaints, everything is drawn pixel-by-pixel in an AnimationFrame-loop ~60 times per second. In earlier tests I measured ~1.25ms (average of 100 runs) for a full screen 80x25 repaint on Firefox, and ~0.7ms under Chrome, though it is not easy currently to measure properly.
Text rendering is implemented in class TextCanvas, which is designed tightly around V86's ScreenAdapter, so here's Demo 2 with a running V86 instance in the back using a custom TextCanvas-ScreenAdapter.
Click on Machine -> Boot in the menu, wait ~15 seconds for the image to download befor it starts to boot. It's a 20M FreeDOS with Monkey Island for testing. You can also upload your own image, stop the Machine and select Harddisk -> Import from the menu (the V86 instance has 256M RAM an 16M VGA RAM). Run Monkey Island to see that Text and Graphics mode share the same DOM canvas without interfering (press CTRL+Q to exit Monkey Island).
However, this is fundamentally misdesigned, though I do think it contains some important building blocks for this task.
Conceptually, this should be moved into the VGA emulator, ScreenAdapter is obviously the wrong place.
I've read a bit into it, but it is still a bit of a mystery to me how it is supposed to work, and how OS and VGA card play together here, and if the BIOS is involved. "Code Page" is a high-level concept, what I wrote earlier about INT 21h is merely an OS-specific DOS-concept and as such misplaced here.
From what I understand so far, the VGA card provides several 8-bit font banks, and the OS may upload fonts into these banks. I think this is where "Code Pages" happen, and the VGA card never needs to know about the details beyond the bitmaps. There are separate fonts for 25 and for 50 text rows (16 and 8 scanlines height, respectively). The VGA card knows everything required to implement text mode, but I still have to do a lot of learning to do here, any pointers would be greatly appreciated.
A few details on the fonts, I've used these two:
Text is drawn onto the screen pixel-by-pixel without using canvas's strokeText() or fillText() methods. For that I converted these fonts into bitmaps, character set being the union of unicode codepoints of all 8-bit code pages I am using. TextCanvas selects the active subset of 256 glyphs based on the active code page. So I only need a single font bitmap file to cover all code pages.
I think it's better to just get VGA RAM fonts working somehow
I tinkered around with BIOS int 10h, which interfaces the graphics subsystem to switch screen modes, upload fonts etc.
I wrote two little C programs, one uses int 10h to interface the very old VGA subsystem, and the other uses the VESA BIOS extension. It's been 30 years since I wrote software in real mode, so that was quite some fun :)
If you want to follow along, here's the image containing FreeDOS, my sources and the C toolchain (image is configured in German, but that shouldn't matter): FreeDOS-256m-de.zip (87M).
After booting, enter:
cd gfxtest
nmake
Now you have two executables, VGATEST.EXE and VESATEST.EXE, run
VESATEST.EXE -b
To get this list of supported graphics modes (that's vgabios.bin answering here):
[ 1] 0x0100: 640x400 8bpp attr=1 planes=1 memm=4 r=0:0 g=0:0 b=0:0
[ 2] 0x0101: 640x480 8bpp attr=1 planes=1 memm=4 r=0:0 g=0:0 b=0:0
[ 4] 0x0103: 800x600 8bpp attr=1 planes=1 memm=4 r=0:0 g=0:0 b=0:0
[ 6] 0x0105: 1024x768 8bpp attr=1 planes=1 memm=4 r=0:0 g=0:0 b=0:0
[ 8] 0x0107: 1280x1024 8bpp attr=1 planes=1 memm=4 r=0:0 g=0:0 b=0:0
[ 9] 0x010d: 320x200 15bpp attr=1 planes=1 memm=6 r=5:a g=5:5 b=5:0
[10] 0x010e: 320x200 16bpp attr=1 planes=1 memm=6 r=5:b g=6:5 b=5:0
[11] 0x010f: 320x200 24bpp attr=1 planes=1 memm=6 r=8:10 g=8:8 b=8:0
[12] 0x0110: 640x480 15bpp attr=1 planes=1 memm=6 r=5:a g=5:5 b=5:0
[13] 0x0111: 640x480 16bpp attr=1 planes=1 memm=6 r=5:b g=6:5 b=5:0
[14] 0x0112: 640x480 24bpp attr=1 planes=1 memm=6 r=8:10 g=8:8 b=8:0
[15] 0x0113: 800x600 15bpp attr=1 planes=1 memm=6 r=5:a g=5:5 b=5:0
[16] 0x0114: 800x600 16bpp attr=1 planes=1 memm=6 r=5:b g=6:5 b=5:0
[17] 0x0115: 800x600 24bpp attr=1 planes=1 memm=6 r=8:10 g=8:8 b=8:0
[18] 0x0116: 1024x768 15bpp attr=1 planes=1 memm=6 r=5:a g=5:5 b=5:0
[19] 0x0117: 1024x768 16bpp attr=1 planes=1 memm=6 r=5:b g=6:5 b=5:0
[20] 0x0118: 1024x768 24bpp attr=1 planes=1 memm=6 r=8:10 g=8:8 b=8:0
[21] 0x0119: 1280x1024 15bpp attr=1 planes=1 memm=6 r=5:a g=5:5 b=5:0
[22] 0x011a: 1280x1024 16bpp attr=1 planes=1 memm=6 r=5:b g=6:5 b=5:0
[23] 0x011b: 1280x1024 24bpp attr=1 planes=1 memm=6 r=8:10 g=8:8 b=8:0
[24] 0x011c: 1600x1200 8bpp attr=1 planes=1 memm=4 r=0:0 g=0:0 b=0:0
[25] 0x011d: 1600x1200 15bpp attr=1 planes=1 memm=6 r=5:a g=5:5 b=5:0
[26] 0x011e: 1600x1200 16bpp attr=1 planes=1 memm=6 r=5:b g=6:5 b=5:0
[27] 0x011f: 1600x1200 24bpp attr=1 planes=1 memm=6 r=8:10 g=8:8 b=8:0
[28] 0x0140: 320x200 32bpp attr=1 planes=1 memm=6 r=8:10 g=8:8 b=8:0
[29] 0x0141: 640x400 32bpp attr=1 planes=1 memm=6 r=8:10 g=8:8 b=8:0
[30] 0x0142: 640x480 32bpp attr=1 planes=1 memm=6 r=8:10 g=8:8 b=8:0
[31] 0x0143: 800x600 32bpp attr=1 planes=1 memm=6 r=8:10 g=8:8 b=8:0
[32] 0x0144: 1024x768 32bpp attr=1 planes=1 memm=6 r=8:10 g=8:8 b=8:0
[33] 0x0145: 1280x1024 32bpp attr=1 planes=1 memm=6 r=8:10 g=8:8 b=8:0
[34] 0x0146: 320x200 8bpp attr=1 planes=1 memm=4 r=0:0 g=0:0 b=0:0
[35] 0x0147: 1600x1200 32bpp attr=1 planes=1 memm=6 r=8:10 g=8:8 b=8:0
[36] 0x0148: 1152x864 8bpp attr=1 planes=1 memm=4 r=0:0 g=0:0 b=0:0
[37] 0x0149: 1152x864 15bpp attr=1 planes=1 memm=6 r=5:a g=5:5 b=5:0
[38] 0x014a: 1152x864 16bpp attr=1 planes=1 memm=6 r=5:b g=6:5 b=5:0
[39] 0x014b: 1152x864 24bpp attr=1 planes=1 memm=6 r=8:10 g=8:8 b=8:0
[40] 0x014c: 1152x864 32bpp attr=1 planes=1 memm=6 r=8:10 g=8:8 b=8:0
[41] 0x0175: 1280x768 16bpp attr=1 planes=1 memm=6 r=5:b g=6:5 b=5:0
[42] 0x0176: 1280x768 24bpp attr=1 planes=1 memm=6 r=8:10 g=8:8 b=8:0
[43] 0x0177: 1280x768 32bpp attr=1 planes=1 memm=6 r=8:10 g=8:8 b=8:0
[44] 0x0178: 1280x800 16bpp attr=1 planes=1 memm=6 r=5:b g=6:5 b=5:0
[45] 0x0179: 1280x800 24bpp attr=1 planes=1 memm=6 r=8:10 g=8:8 b=8:0
[46] 0x017a: 1280x800 32bpp attr=1 planes=1 memm=6 r=8:10 g=8:8 b=8:0
[47] 0x017b: 1280x960 16bpp attr=1 planes=1 memm=6 r=5:b g=6:5 b=5:0
[48] 0x017c: 1280x960 24bpp attr=1 planes=1 memm=6 r=8:10 g=8:8 b=8:0
[49] 0x017d: 1280x960 32bpp attr=1 planes=1 memm=6 r=8:10 g=8:8 b=8:0
[50] 0x017e: 1440x900 16bpp attr=1 planes=1 memm=6 r=5:b g=6:5 b=5:0
[51] 0x017f: 1440x900 24bpp attr=1 planes=1 memm=6 r=8:10 g=8:8 b=8:0
[52] 0x0180: 1440x900 32bpp attr=1 planes=1 memm=6 r=8:10 g=8:8 b=8:0
[53] 0x0181: 1400x1050 16bpp attr=1 planes=1 memm=6 r=5:b g=6:5 b=5:0
[54] 0x0182: 1400x1050 24bpp attr=1 planes=1 memm=6 r=8:10 g=8:8 b=8:0
[55] 0x0183: 1400x1050 32bpp attr=1 planes=1 memm=6 r=8:10 g=8:8 b=8:0
[56] 0x0184: 1680x1050 16bpp attr=1 planes=1 memm=6 r=5:b g=6:5 b=5:0
[57] 0x0185: 1680x1050 24bpp attr=1 planes=1 memm=6 r=8:10 g=8:8 b=8:0
[58] 0x0186: 1680x1050 32bpp attr=1 planes=1 memm=6 r=8:10 g=8:8 b=8:0
[59] 0x0187: 1920x1200 16bpp attr=1 planes=1 memm=6 r=5:b g=6:5 b=5:0
[60] 0x0188: 1920x1200 24bpp attr=1 planes=1 memm=6 r=8:10 g=8:8 b=8:0
[61] 0x0189: 1920x1200 32bpp attr=1 planes=1 memm=6 r=8:10 g=8:8 b=8:0
[62] 0x018a: 2560x1600 16bpp attr=1 planes=1 memm=6 r=5:b g=6:5 b=5:0
[63] 0x018b: 2560x1600 24bpp attr=1 planes=1 memm=6 r=8:10 g=8:8 b=8:0
[64] 0x018c: 2560x1600 32bpp attr=1 planes=1 memm=6 r=8:10 g=8:8 b=8:0
[65] 0x018d: 1280x720 16bpp attr=1 planes=1 memm=6 r=5:b g=6:5 b=5:0
[66] 0x018e: 1280x720 24bpp attr=1 planes=1 memm=6 r=8:10 g=8:8 b=8:0
[67] 0x018f: 1280x720 32bpp attr=1 planes=1 memm=6 r=8:10 g=8:8 b=8:0
[68] 0x0190: 1920x1080 16bpp attr=1 planes=1 memm=6 r=5:b g=6:5 b=5:0
[69] 0x0191: 1920x1080 24bpp attr=1 planes=1 memm=6 r=8:10 g=8:8 b=8:0
[70] 0x0192: 1920x1080 32bpp attr=1 planes=1 memm=6 r=8:10 g=8:8 b=8:0
[71] 0x0193: 1600x900 16bpp attr=1 planes=1 memm=6 r=5:b g=6:5 b=5:0
[72] 0x0194: 1600x900 24bpp attr=1 planes=1 memm=6 r=8:10 g=8:8 b=8:0
[73] 0x0195: 1600x900 32bpp attr=1 planes=1 memm=6 r=8:10 g=8:8 b=8:0
[74] 0x0196: 2560x1440 16bpp attr=1 planes=1 memm=6 r=5:b g=6:5 b=5:0
[75] 0x0197: 2560x1440 24bpp attr=1 planes=1 memm=6 r=8:10 g=8:8 b=8:0
[76] 0x0198: 2560x1440 32bpp attr=1 planes=1 memm=6 r=8:10 g=8:8 b=8:0
[91] 0x0013: 320x200 8bpp attr=1 planes=1 memm=4 r=0:0 g=0:0 b=0:0
Then, for example, switch to 800x600 32bpp using:
VESATEST.EXE 0x0143
You've switched into a graphics mode, but you can still see the text console and interact with it. That's a graphical font built into vgabios.bin, I believe (there's a chance that FreeDOS uploaded this font into the graphics card at boot time, I don't know yet).
I guess that is how it's supposed to work, and it looks ~broken~ [EDIT: ... the same as under 86Box emulator, so I guess it works as expected]. You can see that if you enter the HELP command when in any of these graphics modes (exit HELP by pressing ESC twice, it works even if you can't see it; restore 80x25 Text mode with VESATEST.EXE 3). ~So far I have no clue what's broken here.~
I believe the text-fonts are located in the graphics card ROM, whereas the graphics fonts can be replaced - can anyone confirm this?
The VGA font table is actually used in text mode only, the OS can freely upload sets of 256 glyphs as it whishes.
On the other hand, in VGA text mode plane 0 holds the character codes, plane 1 the attributes, and plane 2 the glyph bitmaps (see "Memory Layout in text modes")!
Now it makes sense.
If in VGA text mode text was drawn to the canvas using the glyph bitmaps from plane 2 then it should all work as expected.
Selecting text and copying it to clipboard cannot work out-of-the box because the (OS-dependent) code page information needed to map the 8-bit character codes to their respective unicode code points is not generally available (I couldn't find anything in this respect).
I will change my demo to simply download the font from plane 2 using int 10h and use that instead of my scanned bitmaps. It should really be implemented in vga.js.
EDIT: Currently, vga_memory_write() in vga.js discards font data that it receives, and VGAScreen.vga_memory_write_text_mode() would need to be replaced entirely. Cursor emulation needs be considered, too. Since several bus events (like "screen-put-char") would need to be dropped this is a breaking change. Anything else to consider?
You've switched into a graphics mode, but you can still see the text console and interact with it. That's a graphical font built into vgabios.bin, I believe (there's a chance that FreeDOS uploaded this font into the graphics card at boot time, I don't know yet).
If I am not mistaken, with teletype int 10h, ah=13h you can write text in graphics mode (for example, https://copy.sh/v86?profile=hello-v86 uses this) and without using any custom preloaded fonts except standard VGA fonts.
On the other hand, in VGA text mode plane 0 holds the character codes, plane 1 the attributes, and plane 2 the glyph bitmaps (see "Memory Layout in text modes")!
I think it is unlikely the VGA fonts and glyphs stays in planes when you set graphical mode, I have read some code in Bochs VGA Bios, and the fonts seem to be taken from other place and draw pixel by pixel in the framebuffer on writing the character:
- https://github.com/qemu/vgabios/blob/19ea12c230ded95928ecaef0db47a82231c2e485/vgabios.c#L1999-L2030
- https://github.com/qemu/vgabios/blob/19ea12c230ded95928ecaef0db47a82231c2e485/vgabios.c#L1586-L1611
Thank you for your input, great links!
You're right about the teletype int 10h, I hadn't come around to try it out yet.
Regarding fonts in plane 2, in biosfn_set_video_mode(mode) they load the ROM's 8x16-font using int 10h when switching into a text mode (not the OS's custom one), see line 1018. Maybe the OS is expected to reupload its font after a mode change?
Maybe the OS is expected to reupload its font after a mode change?
Makes sense, I have found option SCREEN for FreeDOS and a few words catched my attention:
Some newer graphics cards may not have 8x14 fonts in the BIOS. In that case, a driver can be loaded to load a suitable font in RAM, but SCREEN=0x11 should not be used.
I guess it's a driver like a DISPLAY.SYS that I mentioned earlier?
Thanks again! I agree that DISPLAY.SYS is likely one driver that should do.
Regarding the text fonts: Before writing a font to VGA memory, the Sequencer Data Register's "Memory Plane Write Enable" byte (0x3C5, index 0x02) is set to 0x04 (write to plane 3), and after finishing it's set back to 0x03 (write to plane 1+2). Using this I can log in VGAScreen.port3C5_write() when the OS begins and ends writing to the font bitmaps. I also patched VGAScreen.vga_memory_write() to not discard the font bitmap data when plane 3 is being written to and routed it into buffer this.plane3[] to not trash the text screen (which it otherwise does, so there's data coming in).
Next I tested 4 different OSes to see how they modify the VGA text font at boot time and later.
- MSDOS622/en (CP437): writes once to font buffer at boot time
- MSDOS622/de (CP850): writes twice
- FreeDOS13/en (CP437): writes once
- FreeDOS13/de (CP850): writes three times
The first write always occures right at the start of booting (when the BIOS presents its boot menu), this must be the CP437 font from the BIOS, the second one comes a bit later and must be the switch to CP850 in FDAUTO.BAT/AUTOEXEC.BAT. I am not sure yet what the third write access from FreeDOS13/de is, but this looks really promising.
I then launched a game under FreeDOS13/de to switch to graphics mode, exited back to text mode, and indeed, the font buffers are being written to again twice after leaving the game, the first should be CP437 and the second CP850.
I think adding an alternate text mode to VGAScreen is a way to integrate this new feature as a non-breaking change. Consider a new "graphical text mode" for VGAScreen, if active then VGAScreen...
- stays in graphical "canvas" mode for all VGA video modes (text and graphical)
- renders text mode using true VGA fonts (as opposed to the browser's Unicode font)
- handles text-related bus events internally (instead of posting them to the bus), that is:
- screen-put-char
- screen-update-cursor
- screen-update-cursor-scanline
- screen-set-size-text (instead: screen-set-size-graphical)
VGAScreen registers for some new bus event to allow this alternate text mode to be enabled/disabled, and by default it's disabled.
Would this be ok?
Thanks for your notes, they are very helpful for me!
I tried to make a small demo that gives a view of the font from Plane 2 (first you need recompile libv86.js with included patch): https://gist.github.com/SuperMaxusa/7c8329c3f9e41db5114d57046870de03
It updates canvas every 1 second, but I think a better solution is to register event like a "vga-font-plane-write" then call displayFont() with it.
Also you can grab freedos13.img for testing here: freedos13.tar.gz
By default FreeDOS' display driver is not loaded on startup, so we get a CP437 8x16 glyphs:
screenshot
You can load a CP850 charset manually with these commands:
lh A:\FDOS\BIN\DISPLAY.EXE CON=(EGA,850,1)
A:\FDOS\BIN\MODE CON CP PREP=((850) A:\CPI\EGA.CPX)
A:\FDOS\BIN\MODE CON CP SEL=850
A:\FDOS\BIN\MODE CON CP REFRESH
A:\FDOS\BIN\MODE CON CP /STATUS
screenshot
You can try to load FNT font using gnuchcp:
gnuchcp.exe A:\gnufonts\<name>.fnt
(for resetting use gnuchcp.exe -r)
screenshot
When Fontraption (command: A:\FRAPT\FRAPT.COM) starts, it changes some glyphs for the interface (but works buggy and the glyphs don't change in preview):
screenshot + comparing changes
And some tests of changing VGA modes:
When Magiduck game (command: cd A:\MAGI and DUCK.EXE) is started, it switches to 40x25 text mode with 8x8 glyphs, and on font canvas it looks some glitched because 8x8 font overwrites previous font and when needed to get char, it's cuts by scan line: http://www.osdever.net/FreeVGA/vga/char.txt. For now I don't have idea, how to get this maximum scan line from hardware side, like how int 10h, ah=1130h does it.
screenshot
When I switch to graphics mode like via this asm code:
mov ah, 0
mov al, 13h
int 10h
...graphics mode also overwrites bitplanes, along with the font bitplane.
screenshot
You're very welcome! In fact I was a bit worried if my notes were too much. :)
Really impressed by the clever demo you made there, love to see those font dumps as I know what to look for (that surely is a CP850)! I wasn't aware that you are also working on this, great!
I installed your demo, reduced the sleep from 1000 to 50ms, and booted it up with my FreeDOS/de HDA (that image is in German, so FreeDOS sets CP850 in FDAUTO.BAT). It behaves exactly as expected. I also checked out gnuchcp and frapt from your floppy, very usefull tools which come in really handy.
Thanks to your demo I think the basic concept is clear now.
More notes:
I've begun a deep dive into the text-mode related VGA registers, and I'm surprised about the variety of possible configurations, shows just how important text mode was back in the days.
An unsorted list of things that should be supported/considered (in my opinion):
- font width of 8 or 9 pixels: any font height is ok (up to 32), but font width 9 has special rules
- 200, 350, 400 scan lines
- text cursor, blink text attribute
- VGA supports 8 character sets (8K each) in font plane 2 (64K)
- up to 2 of the 8 character sets (called "A" and "B") may be active simultaneously
- the 9th column of fonts having width 9 is implicit (meaning not explicitly stored in the bitmap)
- Line Graphics Enable (LGA): duplicate 8th to 9th column in horziontal line drawing characters for fonts having width 9
A problem I see is that the state of the "text rendering machine" is scattered over about a dozen different VGA register fields that can each be changed individually at any time, there's no thing like a "transaction" that would tell us when the text rendering state has transitioned from one consistent state to another. But, rendering will be clocked by the browser's requestAnimationFrame() loop, it can hit in the middle of the OS changing VGA registers. This might cause unpredictable flicker during VGA state transitions.
I think rendering could be simplified if the VGA's raw font bitmaps in plane 2 were transformed into a simple, flat array of booleans (with no gaps between glyphs, simply 256 * font_width * font_height) whenever the font's size or shape has changed. The implicit 9th column and LGA (both affect font shape) could be incorporated into that simple array to keep this stuff out of the rendering loop.
This might cause unpredictable flicker during VGA state transitions.
You mean like the race condition between canvas and frame updates?
By the way, I have noticed that in TextCanvas.render() uses performance.now() for blinking effect, how about using the frame counter[^1] (as done in PCjs) or requestAnimationFrame() callback's timestamp for this?
[^1]: PCjs "blinks" in text mode every 10 frames per second (about 170 ms per second) which is close to real hardware. According to http://www.osdever.net/FreeVGA/vga/textcur.htm#blink, the blink rate for VGA is 16 frames per second (about 260 ms per second).
- font width of 8 or 9 pixels: any font height is ok (up to 32), but font width 9 has special rules
Also can be 16 (2x width scale) (but not 18). For example, Tetris in MS-DOS profile.