HDL-deflate icon indicating copy to clipboard operation
HDL-deflate copied to clipboard

First 5 bytes are "zeroed" out

Open ryanjpearson opened this issue 4 years ago • 27 comments
trafficstars

Hi, I set up deflate.py with a 32768 OBSIZE and COMPRESS = False. I wrote C code based on test_deflate.py. When it runs, the first 5 bytes that are output on o_byte are zeroed out, but after that, the bytes look correct. The bytes are not coming out shifted. Byte 5 is what byte 5 is supposed to be, when I compared the output with the output of a software inflate function. The code is attached, thanks for any suggestions! (I'm particular wondering if there's an issue in InitDecompress, WriteDecompress, ReadDecompress, or DeflateClockStrobe functions.)

decompress.txt

ryanjpearson avatar Dec 21 '20 14:12 ryanjpearson

@ryanjpearson I will have a look later this week...

tomtor avatar Dec 21 '20 21:12 tomtor

The PNG code makes it a bit harder to find the root cause.

Could you first change your example C code to test the decompression of (EDIT):

static char ctest[] = "\x18\x95u\xcf\xbb\t\x80@\x00\x04\xd1V\xd6\x0e\xdc\xf3\xdf\x81\x1d\xd8\x80\x66\x07\x07\xf6\x1f\x18\t\n;\x13N\xf6$\xedW\xadMG\xbb\xeb\xd9\xa9\xd7\xdbo;\xef\x92\xf7\x90\xf7\x98\xf7\x94\xf7\x9c\xf7\x92\xf7\x9a\xf7\x06\x1c\x62\x82\xd3\x00\x35H\rT\x83\xd5\x80\x35h\r\\\x83\xb7|\xbc\x0fjCzP";

which should result in:

"   Hello World! 0         Hello World! 1         Hello World! 2         Hello World! 3         Hello World! 4         Hello World! 5         Hello World! 6         Hello World! 7         Hello World! 8         Hello World! 9         Hello World! 10         Hello World! 11         Hello World! 12         Hello World! 13         Hello World! 14         Hello World! 15         Hello World! 16         Hello World! 17         Hello World! 18         Hello World! 19         Hello World! 20         H"

tomtor avatar Dec 22 '20 09:12 tomtor

Here's a simpler implemenation using the test case you provided. Thanks!

decompress.txt

ryanjpearson avatar Dec 23 '20 18:12 ryanjpearson

Here's a simpler implemenation using the test case you provided. Thanks!

I assume you verified that the first 5 bytes are still damaged (0) and the rest of the output is ok?

tomtor avatar Dec 23 '20 23:12 tomtor

Regarding the test data, why don't you just use (EDIT corrected wrong escapes):

static unsigned char Compressed[] = "\x18\x95u\xcf\xbb\t\x80@\x00\x04\xd1V\xd6\x0e\xdc\xf3\xdf\x81\x1d\xd8\x80\x66\x07\x07\xf6\x1f\x18\t\n;\x13N\xf6$\xedW\xadMG\xbb\xeb\xd9\xa9\xd7\xdbo;\xef\x92\xf7\x90\xf7\x98\xf7\x94\xf7\x9c\xf7\x92\xf7\x9a\xf7\x06\x1c\x62\x82\xd3\x00\x35H\rT\x83\xd5\x80\x35h\r\\\x83\xb7|\xbc\x0fjCzP";

In InitDecompress

DeflateWrite(REG_DEFLATE_RESET, 1);
DeflateWrite(REG_I_CLK, 0);
DeflateWrite(REG_DEFLATE_RESET, 0);
DeflateWrite(REG_I_CLK, 1);

Has the clock edge reversed, try changing this to:

DeflateWrite(REG_I_CLK, 0); 
DeflateWrite(REG_DEFLATE_RESET, 1);
DeflateWrite(REG_I_CLK, 1); // Low->High transition to reset
DeflateWrite(REG_DEFLATE_RESET, 0);

Let me know if that makes a difference. If not I'll have a look at the rest of the code.

Could you also tell me a bit more of your use case? Why are you driving/testing the decompress logic from C code?

That will be very slow, just using the processor and a software implementation would have the same or higher speed.

test_deflate_bench in test_deflate.py is an example driving the logic directly.

tomtor avatar Dec 24 '20 12:12 tomtor

Thanks for the suggestion! I'm away from the hardware for 2 weeks, but I will give it a shot then.

Isn't the string you provided an ascii representation of the data? I used binascii.b2a_hex in python to convert it to the bytes in the code. I wasn't sure of any C functions that would convert "/x95" to just a char with value of 0x95. I'm new to that notation, and I thought it was python specific.

I wasn't certain the micro-controller I'm using would have the memory needed for inflate. I may move more functionality to the FPGA, but I first wanted to verify I was able to interface to the inflate logic correctly.

ryanjpearson avatar Dec 24 '20 16:12 ryanjpearson

There is indeed an issue with 4 (3 reported by the compiler because it exceeds the 0-255 range) of the encoded bytes:

tom@swan-ubu20:~/src/HDL-deflate$ cat test.c 
#include <stdio.h>

int main()
{
  static unsigned char Compressed[] = "\x18\x95u\xcf\xbb\t\x80@\x00\x04\xd1V\xd6\x0e\xdc\xf3\xdf\x81\x1d\xd8\x80f\x07\x07\xf6\x1f\x18\t\n;\x13N\xf6$\xedW\xadMG\xbb\xeb\xd9\xa9\xd7\xdbo;\xef\x92\xf7\x90\xf7\x98\xf7\x94\xf7\x9c\xf7\x92\xf7\x9a\xf7\x06\x1cb\x82\xd3\x005H\rT\x83\xd5\x805h\r\\\x83\xb7|\xbc\x0fjCzP";
  for (int i= 0; i < sizeof(Compressed)-1; i++)
	  printf("%02x ", Compressed[i]);
  printf("\n");
}
tom@swan-ubu20:~/src/HDL-deflate$ clang test.c; 
test.c:5:109: error: hex escape sequence out of range
  static unsigned char Compressed[] = "\x18\x95u\xcf\xbb\t\x80@\x00\x04\xd1V\xd6\x0e\xdc\xf3\xdf\x81\x1d\xd8\x80f\x07\x07\xf6\x1f\x18\t\n;\x13N\xf6$\xedW\xadMG\xbb\xeb\xd9\xa9\xd7\xdbo;\xef\x92\xf7\x90\xf7\x98\xf7\x94\xf7\x9c\xf7\x92\xf7\x9a\xf7\x06\x1cb\x82\xd3\x005H\rT\x83\xd5\x805h\r\\\x83\xb7|\xbc\x0fjCzP";
                                                                                                            ^~~~~
test.c:5:250: error: hex escape sequence out of range
  static unsigned char Compressed[] = "\x18\x95u\xcf\xbb\t\x80@\x00\x04\xd1V\xd6\x0e\xdc\xf3\xdf\x81\x1d\xd8\x80f\x07\x07\xf6\x1f\x18\t\n;\x13N\xf6$\xedW\xadMG\xbb\xeb\xd9\xa9\xd7\xdbo;\xef\x92\xf7\x90\xf7\x98\xf7\x94\xf7\x9c\xf7\x92\xf7\x9a\xf7\x06\x1cb\x82\xd3\x005H\rT\x83\xd5\x805h\r\\\x83\xb7|\xbc\x0fjCzP";
                                                                                                                                                                                                                                                         ^~~~~
test.c:5:280: error: hex escape sequence out of range
  static unsigned char Compressed[] = "\x18\x95u\xcf\xbb\t\x80@\x00\x04\xd1V\xd6\x0e\xdc\xf3\xdf\x81\x1d\xd8\x80f\x07\x07\xf6\x1f\x18\t\n;\x13N\xf6$\xedW\xadMG\xbb\xeb\xd9\xa9\xd7\xdbo;\xef\x92\xf7\x90\xf7\x98\xf7\x94\xf7\x9c\xf7\x92\xf7\x9a\xf7\x06\x1cb\x82\xd3\x005H\rT\x83\xd5\x805h\r\\\x83\xb7|\xbc\x0fjCzP";
                                                                                                                                                                                                                                                                                       ^~~~~
3 errors generated.

eg \x80f should have been encoded \x80\x66 because f is a hex digit. This is different from Python encoding, where the encoding is limited to 2 hex characters \xHH.

Corrected:

tom@swan-ubu20:~/src/HDL-deflate$ cat test.c 
#include <stdio.h>

int main()
{
  static unsigned char Compressed[] = "\x18\x95u\xcf\xbb\t\x80@\x00\x04\xd1V\xd6\x0e\xdc\xf3\xdf\x81\x1d\xd8\x80\x66\x07\x07\xf6\x1f\x18\t\n;\x13N\xf6$\xedW\xadMG\xbb\xeb\xd9\xa9\xd7\xdbo;\xef\x92\xf7\x90\xf7\x98\xf7\x94\xf7\x9c\xf7\x92\xf7\x9a\xf7\x06\x1c\x62\x82\xd3\x00\x35H\rT\x83\xd5\x80\x35h\r\\\x83\xb7|\xbc\x0fjCzP";
  for (int i= 0; i < sizeof(Compressed)-1; i++)
	  printf("%02x%c", Compressed[i], (i+1)%16 ? ' ': '\n');
  printf("\n");
}
tom@swan-ubu20:~/src/HDL-deflate$ clang test.c; ./a.out 
18 95 75 cf bb 09 80 40 00 04 d1 56 d6 0e dc f3
df 81 1d d8 80 66 07 07 f6 1f 18 09 0a 3b 13 4e
f6 24 ed 57 ad 4d 47 bb eb d9 a9 d7 db 6f 3b ef
92 f7 90 f7 98 f7 94 f7 9c f7 92 f7 9a f7 06 1c
62 82 d3 00 35 48 0d 54 83 d5 80 35 68 0d 5c 83
b7 7c bc 0f 6a 43 7a 50 

tomtor avatar Dec 25 '20 11:12 tomtor

I used your suggestion and fixed a couple of mistakes in the decompress.txt. Attached is the latest. decompress.txt

The output has bytes 0 through 18, 26, and 40 "zeroed" out. The rest of the bytes are what they should be.

Here's a screen shot of the first 46 bytes of output output

ryanjpearson avatar Jan 07 '21 22:01 ryanjpearson

The output has bytes 0 through 18, 26, and 40 "zeroed" out. The rest of the bytes are what they should be.

What happens if you do 2 clock ticks around line 100:

if (DecompressedReadCount < GetOProgress()) {
		did_read = 1;
		DeflateWrite(REG_I_MODE, READ);
		DeflateWrite(REG_I_RADDR, DecompressedReadCount);
		DeflateClockStrobe(); DeflateClockStrobe(); // TWO TICKS
		DecompressedReadCount++;
	}

tomtor avatar Jan 08 '21 16:01 tomtor

That made the problem go away! Thank you!

ryanjpearson avatar Jan 08 '21 16:01 ryanjpearson

That made the problem go away! Thank you!

That's good news! You will probably only need the extra tick when

DecompressedReadCount + 1 == GetOProgress()

tomtor avatar Jan 08 '21 18:01 tomtor

Now, I'm trying to push through a deflate block of size 705 bytes. o_iprogress is stalling at 481, meaning I'm only able to write 513 bytes. This number is interesting (481 = 512 - 32 + 1). Any suggestions? Thanks!

ryanjpearson avatar Jan 12 '21 23:01 ryanjpearson

The standard input buffer is 512 bytes and cyclic. 32 bytes are reserved for look back when compressing (could also apply when decompressing).

So if you are feeding data to decompress and it does not fit completely in the input buffer, you should also start the decompression and read the result so that the input buffer is emptied and you can reuse it in a cyclic way (after writing slot 511, continue with slot 0).

See the test code around:

https://github.com/tomtor/HDL-deflate/blob/9a561a6386d20d6e1df8b70693c4a81a6b648161/test_deflate.py#L159

tomtor avatar Jan 15 '21 10:01 tomtor

Following the example of test_deflate.py, I believe I start decompression at the beginning of the process. Line 59 in decompress.txt: DeflateWrite(REG_I_MODE, STARTD);

And I have an analogous check on i_progress. Line 66 in decompress.txt: if ((int)GetIProgress() > CompressedWriteCount - MAXW)

If you're able to take a look, I attached a zip file with the small png I'm testing with. And there's a binary file with just the compressed block that's inside the png. The compress block has a size of 705 bytes. It should inflate to a size of 5656 bytes (100x56 PNG 8-bit with 1 filter byte per line. 101*56=5656).

png_deflate.zip

Thanks!

ryanjpearson avatar Jan 15 '21 15:01 ryanjpearson

Can you print GetIProgress() and GetOProgress() and CompressedWriteCount in the inner loop when writing the input data, to get some debug info?

Both Progress indicators should start progressing after a number of clock cycles

tomtor avatar Jan 16 '21 11:01 tomtor

Here you go! It prints each time I'm able to write a byte to i_data and CompressedWriteCount increments. Thanks

PNGDecompressOutput.txt

ryanjpearson avatar Jan 21 '21 19:01 ryanjpearson

Your output ends with:

IProgress: 478	OProgress: 3699	CompressedWriteCount: 510	 NumberOfClocks: 78251
IProgress: 480	OProgress: 3713	CompressedWriteCount: 511	 NumberOfClocks: 78278
IProgress: 480	OProgress: 3713	CompressedWriteCount: 512	 NumberOfClocks: 78281
IProgress: 481	OProgress: 3713	CompressedWriteCount: 513	 NumberOfClocks: 78284

That looks good!

I expect that at the end of your log you are no longer repeatedly calling DeflateClockStrobe().

That is needed in order to advance IProgress to slot 482 and beyond.

kad-vijlbc avatar Jan 22 '21 10:01 kad-vijlbc

Also note CompressedWriteCount: 513.

You have just written byte 513, do you store this byte at slot 0, because the buffer is sized 512 bytes by default.

kad-vijlbc avatar Jan 22 '21 10:01 kad-vijlbc

It is still repeatedly calling DeflateClockStrobe() after that. It's only printing when it's able to write a byte ( meaning ((int)GetIProgress() > CompressedWriteCount - MAXW)). And the write addresses do wrap around back to zero since the register is 9 bits. Byte 513 is written to address 1.

ryanjpearson avatar Jan 22 '21 11:01 ryanjpearson

Another thing: the o_done bit goes high after byte 449 is written ( I think it was 449, I'll confirm tomorrow). I was under the belief that png contains just one deflate block, but I think that is false now. So, it may be that deflate is working fine, and I just need to start a new block. I will check this tomorrow. My apologies for reporting an error if this is the case.

ryanjpearson avatar Jan 22 '21 11:01 ryanjpearson

uint8_t WriteDecompress(uint8_t CompressedByte)
{
	uint8_t ret;
	if ((int)GetIProgress() > CompressedWriteCount - MAXW)
	{
		DeflateWrite(REG_I_MODE, WRITE);
		DeflateWrite(REG_I_WADDR, CompressedWriteCount);
		DeflateWrite(REG_I_DATA, CompressedByte);

		CompressedWriteCount++;
		ret = 0; //byte written
	}
	else
	{
		DeflateWrite(REG_I_MODE, IDLEM);
		ret = 1; //byte not written, waiting for space
	}

	return ret;

}

is different from the Python code. You switch to IDLE when waiting, the Python code switches to IDLE when the last input byte is written...

https://github.com/tomtor/HDL-deflate/blob/9a561a6386d20d6e1df8b70693c4a81a6b648161/test_deflate.py#L169

kad-vijlbc avatar Jan 22 '21 11:01 kad-vijlbc

Ah thanks, I will correct this.

ryanjpearson avatar Jan 22 '21 11:01 ryanjpearson

Another thing: the o_done bit goes high after byte 449 is written ( I think it was 449, I'll confirm tomorrow). I was under the belief that png contains just one deflate block, but I think that is false now. So, it may be that deflate is working fine, and I just need to start a new block. I will check this tomorrow. My apologies for reporting an error if this is the case.

IProgress should not progress after the end of a block, so I do not think there is a block end at position 449 or before 481.

tomtor avatar Jan 22 '21 11:01 tomtor

If the end of a block is reached and there are more blocks, do I need to start a new decompression and write STARTD to i_mode again? Or do I just keep writing in the compressed data?

ryanjpearson avatar Jan 25 '21 20:01 ryanjpearson

Or do I just keep writing in the compressed data?

That should work, it does handle multiple blocks in a stream.

I do not think that your issue is related to blocks, it is somehow related to writing input byte 513 (buffersize + 1).

You could increase the buffersize to 1024 and it will most likely stop at position 1025.

Ah thanks, I will correct this.

This made no difference?

tomtor avatar Jan 25 '21 22:01 tomtor

Ah thanks, I will correct this.

This made no difference? It did not make a difference, it still stalled at the same spot.

ryanjpearson avatar Jan 26 '21 04:01 ryanjpearson

I also noticed that my cwindow size was wrong, I had it set to 32 in my code. Since I had Compress = False and Lowlut = False, the CWINDOW size should be 256. I made this fix in the code, and it did not seem to matter. It still stalls after writing the 513th byte to address 0. I tried a build with IBSIZE = 32768. The o_done bit still goes high after i_progress reaches 449.

ryanjpearson avatar Feb 05 '21 17:02 ryanjpearson