HDL-deflate
HDL-deflate copied to clipboard
First 5 bytes are "zeroed" out
Hi, I set up deflate.py with a 32768 OBSIZE and COMPRESS = False. I wrote C code based on test_deflate.py. When it runs, the first 5 bytes that are output on o_byte are zeroed out, but after that, the bytes look correct. The bytes are not coming out shifted. Byte 5 is what byte 5 is supposed to be, when I compared the output with the output of a software inflate function. The code is attached, thanks for any suggestions! (I'm particular wondering if there's an issue in InitDecompress, WriteDecompress, ReadDecompress, or DeflateClockStrobe functions.)
@ryanjpearson I will have a look later this week...
The PNG code makes it a bit harder to find the root cause.
Could you first change your example C code to test the decompression of (EDIT):
static char ctest[] = "\x18\x95u\xcf\xbb\t\x80@\x00\x04\xd1V\xd6\x0e\xdc\xf3\xdf\x81\x1d\xd8\x80\x66\x07\x07\xf6\x1f\x18\t\n;\x13N\xf6$\xedW\xadMG\xbb\xeb\xd9\xa9\xd7\xdbo;\xef\x92\xf7\x90\xf7\x98\xf7\x94\xf7\x9c\xf7\x92\xf7\x9a\xf7\x06\x1c\x62\x82\xd3\x00\x35H\rT\x83\xd5\x80\x35h\r\\\x83\xb7|\xbc\x0fjCzP";
which should result in:
" Hello World! 0 Hello World! 1 Hello World! 2 Hello World! 3 Hello World! 4 Hello World! 5 Hello World! 6 Hello World! 7 Hello World! 8 Hello World! 9 Hello World! 10 Hello World! 11 Hello World! 12 Hello World! 13 Hello World! 14 Hello World! 15 Hello World! 16 Hello World! 17 Hello World! 18 Hello World! 19 Hello World! 20 H"
Here's a simpler implemenation using the test case you provided. Thanks!
I assume you verified that the first 5 bytes are still damaged (0) and the rest of the output is ok?
Regarding the test data, why don't you just use (EDIT corrected wrong escapes):
static unsigned char Compressed[] = "\x18\x95u\xcf\xbb\t\x80@\x00\x04\xd1V\xd6\x0e\xdc\xf3\xdf\x81\x1d\xd8\x80\x66\x07\x07\xf6\x1f\x18\t\n;\x13N\xf6$\xedW\xadMG\xbb\xeb\xd9\xa9\xd7\xdbo;\xef\x92\xf7\x90\xf7\x98\xf7\x94\xf7\x9c\xf7\x92\xf7\x9a\xf7\x06\x1c\x62\x82\xd3\x00\x35H\rT\x83\xd5\x80\x35h\r\\\x83\xb7|\xbc\x0fjCzP";
In InitDecompress
DeflateWrite(REG_DEFLATE_RESET, 1);
DeflateWrite(REG_I_CLK, 0);
DeflateWrite(REG_DEFLATE_RESET, 0);
DeflateWrite(REG_I_CLK, 1);
Has the clock edge reversed, try changing this to:
DeflateWrite(REG_I_CLK, 0);
DeflateWrite(REG_DEFLATE_RESET, 1);
DeflateWrite(REG_I_CLK, 1); // Low->High transition to reset
DeflateWrite(REG_DEFLATE_RESET, 0);
Let me know if that makes a difference. If not I'll have a look at the rest of the code.
Could you also tell me a bit more of your use case? Why are you driving/testing the decompress logic from C code?
That will be very slow, just using the processor and a software implementation would have the same or higher speed.
test_deflate_bench in test_deflate.py is an example driving the logic directly.
Thanks for the suggestion! I'm away from the hardware for 2 weeks, but I will give it a shot then.
Isn't the string you provided an ascii representation of the data? I used binascii.b2a_hex in python to convert it to the bytes in the code. I wasn't sure of any C functions that would convert "/x95" to just a char with value of 0x95. I'm new to that notation, and I thought it was python specific.
I wasn't certain the micro-controller I'm using would have the memory needed for inflate. I may move more functionality to the FPGA, but I first wanted to verify I was able to interface to the inflate logic correctly.
There is indeed an issue with 4 (3 reported by the compiler because it exceeds the 0-255 range) of the encoded bytes:
tom@swan-ubu20:~/src/HDL-deflate$ cat test.c
#include <stdio.h>
int main()
{
static unsigned char Compressed[] = "\x18\x95u\xcf\xbb\t\x80@\x00\x04\xd1V\xd6\x0e\xdc\xf3\xdf\x81\x1d\xd8\x80f\x07\x07\xf6\x1f\x18\t\n;\x13N\xf6$\xedW\xadMG\xbb\xeb\xd9\xa9\xd7\xdbo;\xef\x92\xf7\x90\xf7\x98\xf7\x94\xf7\x9c\xf7\x92\xf7\x9a\xf7\x06\x1cb\x82\xd3\x005H\rT\x83\xd5\x805h\r\\\x83\xb7|\xbc\x0fjCzP";
for (int i= 0; i < sizeof(Compressed)-1; i++)
printf("%02x ", Compressed[i]);
printf("\n");
}
tom@swan-ubu20:~/src/HDL-deflate$ clang test.c;
test.c:5:109: error: hex escape sequence out of range
static unsigned char Compressed[] = "\x18\x95u\xcf\xbb\t\x80@\x00\x04\xd1V\xd6\x0e\xdc\xf3\xdf\x81\x1d\xd8\x80f\x07\x07\xf6\x1f\x18\t\n;\x13N\xf6$\xedW\xadMG\xbb\xeb\xd9\xa9\xd7\xdbo;\xef\x92\xf7\x90\xf7\x98\xf7\x94\xf7\x9c\xf7\x92\xf7\x9a\xf7\x06\x1cb\x82\xd3\x005H\rT\x83\xd5\x805h\r\\\x83\xb7|\xbc\x0fjCzP";
^~~~~
test.c:5:250: error: hex escape sequence out of range
static unsigned char Compressed[] = "\x18\x95u\xcf\xbb\t\x80@\x00\x04\xd1V\xd6\x0e\xdc\xf3\xdf\x81\x1d\xd8\x80f\x07\x07\xf6\x1f\x18\t\n;\x13N\xf6$\xedW\xadMG\xbb\xeb\xd9\xa9\xd7\xdbo;\xef\x92\xf7\x90\xf7\x98\xf7\x94\xf7\x9c\xf7\x92\xf7\x9a\xf7\x06\x1cb\x82\xd3\x005H\rT\x83\xd5\x805h\r\\\x83\xb7|\xbc\x0fjCzP";
^~~~~
test.c:5:280: error: hex escape sequence out of range
static unsigned char Compressed[] = "\x18\x95u\xcf\xbb\t\x80@\x00\x04\xd1V\xd6\x0e\xdc\xf3\xdf\x81\x1d\xd8\x80f\x07\x07\xf6\x1f\x18\t\n;\x13N\xf6$\xedW\xadMG\xbb\xeb\xd9\xa9\xd7\xdbo;\xef\x92\xf7\x90\xf7\x98\xf7\x94\xf7\x9c\xf7\x92\xf7\x9a\xf7\x06\x1cb\x82\xd3\x005H\rT\x83\xd5\x805h\r\\\x83\xb7|\xbc\x0fjCzP";
^~~~~
3 errors generated.
eg \x80f should have been encoded \x80\x66 because f is a hex digit. This is different from Python encoding, where the encoding is limited to 2 hex characters \xHH.
Corrected:
tom@swan-ubu20:~/src/HDL-deflate$ cat test.c
#include <stdio.h>
int main()
{
static unsigned char Compressed[] = "\x18\x95u\xcf\xbb\t\x80@\x00\x04\xd1V\xd6\x0e\xdc\xf3\xdf\x81\x1d\xd8\x80\x66\x07\x07\xf6\x1f\x18\t\n;\x13N\xf6$\xedW\xadMG\xbb\xeb\xd9\xa9\xd7\xdbo;\xef\x92\xf7\x90\xf7\x98\xf7\x94\xf7\x9c\xf7\x92\xf7\x9a\xf7\x06\x1c\x62\x82\xd3\x00\x35H\rT\x83\xd5\x80\x35h\r\\\x83\xb7|\xbc\x0fjCzP";
for (int i= 0; i < sizeof(Compressed)-1; i++)
printf("%02x%c", Compressed[i], (i+1)%16 ? ' ': '\n');
printf("\n");
}
tom@swan-ubu20:~/src/HDL-deflate$ clang test.c; ./a.out
18 95 75 cf bb 09 80 40 00 04 d1 56 d6 0e dc f3
df 81 1d d8 80 66 07 07 f6 1f 18 09 0a 3b 13 4e
f6 24 ed 57 ad 4d 47 bb eb d9 a9 d7 db 6f 3b ef
92 f7 90 f7 98 f7 94 f7 9c f7 92 f7 9a f7 06 1c
62 82 d3 00 35 48 0d 54 83 d5 80 35 68 0d 5c 83
b7 7c bc 0f 6a 43 7a 50
I used your suggestion and fixed a couple of mistakes in the decompress.txt. Attached is the latest. decompress.txt
The output has bytes 0 through 18, 26, and 40 "zeroed" out. The rest of the bytes are what they should be.
Here's a screen shot of the first 46 bytes of output

The output has bytes 0 through 18, 26, and 40 "zeroed" out. The rest of the bytes are what they should be.
What happens if you do 2 clock ticks around line 100:
if (DecompressedReadCount < GetOProgress()) {
did_read = 1;
DeflateWrite(REG_I_MODE, READ);
DeflateWrite(REG_I_RADDR, DecompressedReadCount);
DeflateClockStrobe(); DeflateClockStrobe(); // TWO TICKS
DecompressedReadCount++;
}
That made the problem go away! Thank you!
That made the problem go away! Thank you!
That's good news! You will probably only need the extra tick when
DecompressedReadCount + 1 == GetOProgress()
Now, I'm trying to push through a deflate block of size 705 bytes. o_iprogress is stalling at 481, meaning I'm only able to write 513 bytes. This number is interesting (481 = 512 - 32 + 1). Any suggestions? Thanks!
The standard input buffer is 512 bytes and cyclic. 32 bytes are reserved for look back when compressing (could also apply when decompressing).
So if you are feeding data to decompress and it does not fit completely in the input buffer, you should also start the decompression and read the result so that the input buffer is emptied and you can reuse it in a cyclic way (after writing slot 511, continue with slot 0).
See the test code around:
https://github.com/tomtor/HDL-deflate/blob/9a561a6386d20d6e1df8b70693c4a81a6b648161/test_deflate.py#L159
Following the example of test_deflate.py, I believe I start decompression at the beginning of the process. Line 59 in decompress.txt: DeflateWrite(REG_I_MODE, STARTD);
And I have an analogous check on i_progress. Line 66 in decompress.txt: if ((int)GetIProgress() > CompressedWriteCount - MAXW)
If you're able to take a look, I attached a zip file with the small png I'm testing with. And there's a binary file with just the compressed block that's inside the png. The compress block has a size of 705 bytes. It should inflate to a size of 5656 bytes (100x56 PNG 8-bit with 1 filter byte per line. 101*56=5656).
Thanks!
Can you print GetIProgress() and GetOProgress() and CompressedWriteCount in the inner loop when writing the input data, to get some debug info?
Both Progress indicators should start progressing after a number of clock cycles
Here you go! It prints each time I'm able to write a byte to i_data and CompressedWriteCount increments. Thanks
Your output ends with:
IProgress: 478 OProgress: 3699 CompressedWriteCount: 510 NumberOfClocks: 78251
IProgress: 480 OProgress: 3713 CompressedWriteCount: 511 NumberOfClocks: 78278
IProgress: 480 OProgress: 3713 CompressedWriteCount: 512 NumberOfClocks: 78281
IProgress: 481 OProgress: 3713 CompressedWriteCount: 513 NumberOfClocks: 78284
That looks good!
I expect that at the end of your log you are no longer repeatedly calling DeflateClockStrobe().
That is needed in order to advance IProgress to slot 482 and beyond.
Also note CompressedWriteCount: 513.
You have just written byte 513, do you store this byte at slot 0, because the buffer is sized 512 bytes by default.
It is still repeatedly calling DeflateClockStrobe() after that. It's only printing when it's able to write a byte ( meaning ((int)GetIProgress() > CompressedWriteCount - MAXW)). And the write addresses do wrap around back to zero since the register is 9 bits. Byte 513 is written to address 1.
Another thing: the o_done bit goes high after byte 449 is written ( I think it was 449, I'll confirm tomorrow). I was under the belief that png contains just one deflate block, but I think that is false now. So, it may be that deflate is working fine, and I just need to start a new block. I will check this tomorrow. My apologies for reporting an error if this is the case.
uint8_t WriteDecompress(uint8_t CompressedByte)
{
uint8_t ret;
if ((int)GetIProgress() > CompressedWriteCount - MAXW)
{
DeflateWrite(REG_I_MODE, WRITE);
DeflateWrite(REG_I_WADDR, CompressedWriteCount);
DeflateWrite(REG_I_DATA, CompressedByte);
CompressedWriteCount++;
ret = 0; //byte written
}
else
{
DeflateWrite(REG_I_MODE, IDLEM);
ret = 1; //byte not written, waiting for space
}
return ret;
}
is different from the Python code. You switch to IDLE when waiting, the Python code switches to IDLE when the last input byte is written...
https://github.com/tomtor/HDL-deflate/blob/9a561a6386d20d6e1df8b70693c4a81a6b648161/test_deflate.py#L169
Ah thanks, I will correct this.
Another thing: the o_done bit goes high after byte 449 is written ( I think it was 449, I'll confirm tomorrow). I was under the belief that png contains just one deflate block, but I think that is false now. So, it may be that deflate is working fine, and I just need to start a new block. I will check this tomorrow. My apologies for reporting an error if this is the case.
IProgress should not progress after the end of a block, so I do not think there is a block end at position 449 or before 481.
If the end of a block is reached and there are more blocks, do I need to start a new decompression and write STARTD to i_mode again? Or do I just keep writing in the compressed data?
Or do I just keep writing in the compressed data?
That should work, it does handle multiple blocks in a stream.
I do not think that your issue is related to blocks, it is somehow related to writing input byte 513 (buffersize + 1).
You could increase the buffersize to 1024 and it will most likely stop at position 1025.
Ah thanks, I will correct this.
This made no difference?
Ah thanks, I will correct this.
This made no difference? It did not make a difference, it still stalled at the same spot.
I also noticed that my cwindow size was wrong, I had it set to 32 in my code. Since I had Compress = False and Lowlut = False, the CWINDOW size should be 256. I made this fix in the code, and it did not seem to matter. It still stalls after writing the 513th byte to address 0. I tried a build with IBSIZE = 32768. The o_done bit still goes high after i_progress reaches 449.