dexed icon indicating copy to clipboard operation
dexed copied to clipboard

Dexed does not handle (invalid) high bytes in sysex data gracefully

Open eclab opened this issue 6 years ago • 6 comments

There are quite a number of available sysex files online which have high bytes in the middle of their sysex data erroneously. Typically this isn't garbled data, just data with the high bit set for no reason. Dexed can't parse these correctly.

I attach one example. Consider the patch "SYNCLAVIER" (or as Dexed puts it "SY C A VE R"), whose parameters (and name) Dexed interprets hillariously incorrectly.

This is an easy fix: Dexed just needs to strip out all high bits between the F0 and F7, then parse from there.

BANK0054.SYX.zip

eclab avatar Nov 01 '19 03:11 eclab

Looks like adding the line

c &= 0x7F; // strip don't care most-significant bit from name and modifying the compare after default: to if (c < 32) would do the trick.

    static String normalizePgmName(const char *sysexName) {
        char buffer[11];
        
        memcpy(buffer, sysexName, 10);
        
        for (int j = 0; j < 10; j++) {
            char c = (unsigned char) buffer[j];
            c &= 0x7F; // strip don't care most-significant bit from name
            switch (c) {
                case 92:
                    c = 'Y';
                    break; /* yen */
                case 126:
                    c = '>';
                    break; /* >> */
                case 127:
                    c = '<';
                    break; /* << */
                default:
                    if (c < 32)
                        c = 32;
                    break;
            }
            buffer[j] = c;
        }
        buffer[10] = 0;
        
        return String(buffer);
    }

elovian avatar Oct 08 '20 23:10 elovian

Thanks for the suggestion. This would be a quick win.

But the real implementation would be to find the correct charset that Yamaha used for this. Then Yen symbol was not at position 92 on the original codepage 437 it was at position 157. Once we map the right character, we might be able to see them correctly in the UI.

asb2m10 avatar Oct 14 '20 03:10 asb2m10

At least for my example, the issue was not charsets. It was high bits incorrectly being stuck in the middle of sysex data (non-status) bytes, which I've found in a number of DX7 files.

eclab avatar Oct 14 '20 03:10 eclab

92 is the correct position for the yen character on early 8bit Japanese systems. They used a character set called JIS-8 http://ss.sguc.ac.jp/~rider/basic/jis8code.html Which is mostly but but completely similar to ASCII. And still leads to occasional odd characters when reading text files still.

I think it's that the high bit is likely Don't Care but we can wait for confirmation from someone with a DX7.

2020年10月13日(火) 22:17 Evolutionary Computation Laboratory < [email protected]>:

At least for my example, the issue was not charsets. It was high bits incorrectly being stuck in the middle of sysex data (non-status) bytes, which I've found in a number of DX7 files.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/asb2m10/dexed/issues/194#issuecomment-708129920, or unsubscribe https://github.com/notifications/unsubscribe-auth/ARICNLGD3ZIHELR46Y4QAFDSKUJ27ANCNFSM4JHVYQ3A .

elovian avatar Oct 14 '20 03:10 elovian

This may all be true regarding charsets but it seems to be to be irrelevant to the bug report I had posted: sysex messages simply cannot have high bits set. Multiple parameters (not just the name) in my example are set wrong due to invalid high bits. And the name "SYNCLAVIER" doesn't contain a Yen anyway: it just has high bits incorrectly set for several of the (ASCII) characters in the name.

eclab avatar Oct 14 '20 03:10 eclab

I've just merge @elovian fix that is based on the program, e.g. only the programs loaded/saved by Dexed will have the values normalized. See nightly builds. This should fix @eclab issue.

But to do the correct implementation, we should use JIS-8 for program code page also.

asb2m10 avatar Oct 16 '20 04:10 asb2m10