pdwfs icon indicating copy to clipboard operation
pdwfs copied to clipboard

Character encoding issue ?

Open mathaefele opened this issue 5 years ago • 8 comments

Describe the bug

I am new to pdwfs. It looks really nice but the result of the simple example I tried to build is different with and without pdwfs.

How to reproduce

redis-server --daemonize yes
echo "########### Launching simu ##############"
pdwfs -p $PWD/staged -- ./simu
redis-cli dump "/local/home/mhaefele/ownCloud/work/dev/hello_worlds/pdwfs/C/staged/Cpok:0"
echo "########### Launching post-process ##############"
pdwfs -p $PWD/staged -- ./post-process
echo "########### Done ##############"
redis-cli shutdown
  • ./simu is a C program writing 10 times Hello444 in staged/Cpok
  • ./post-process reads staged/Cpok and writes only its first line in ./resC

Expected behaviour

So I expect to have a single "Hello444" in resC which is the case without using pdwfs. When using it I get the following:

mhaefele@mdlspc113:C $ ./launch.sh 
15287:C 13 Nov 14:08:15.990 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
15287:C 13 Nov 14:08:15.990 # Redis version=4.0.9, bits=64, commit=00000000, modified=0, pid=15287, just started
15287:C 13 Nov 14:08:15.990 # Configuration loaded
########### Launching simu ##############
"\x00\xc3\x11@Z\tHello444\nH\xe0E\b\x014\n\b\x00\xc4\xa2\r\xfe\x05f\xedy"
########### Launching post-process ##############
post-process: 
########### Done ##############
mhaefele@mdlspc113:C $ cat resC 

Am I doing something wrong ? Anything related with character encoding ?

Thanks for your help Mat

mathaefele avatar Nov 13 '19 13:11 mathaefele

Hi Mat, Thanks for looking into pdwfs!

I think I managed to reproduce your issue. Actually this is not an encoding issue, but it's definitely a bug. I assume that you have used the standard C function getline in your post-process program to read lines from staged/Cpok ? If this is the case, then the issue came from the fact that this function was actually not intercepted by pdwfs, instead the "real" getline from the libc was called on a file stream descriptor not managed by the libc resulting in a completely undefined behaviour. What your post-process was reading was the resulting undefined content of the memory buffer passed to getline.

The reason why getline was not intercepted was because it is inlined by the compiler and replaced by a call to __getdelim C function from the libc which was not intercepted by pdwfs.

I have pushed the branch fix-github-issue-2 that is fixing the problem. Could you try this branch?

I hope that you actually used getline ! If yes, then I think I solved your issue. If not, well, at least I fixed one issue...

JCapul avatar Nov 19 '19 12:11 JCapul

btw, you can easily check what calls are intercepted by pdwfs by running it with the -t (trace) option: pdwfs -t -p $PWD/staged -- ./post-process

JCapul avatar Nov 19 '19 12:11 JCapul

I am using

fscanf(f, "%s", buffer);

and it looks like it is not intercepted neither according to the run with the trace activated:

mhaefele@mdlspc113:C $ ./launch.sh 
30425:C 19 Nov 14:12:30.600 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
30425:C 19 Nov 14:12:30.601 # Redis version=4.0.9, bits=64, commit=00000000, modified=0, pid=30425, just started
30425:C 19 Nov 14:12:30.601 # Configuration loaded
########### Launching simu ##############
[PDWFS][30436][TRACE][C] intercepting fopen(path=staged/Cpok, mode=w)
[PDWFS][30436][TRACE][C] intercepting fwrite(ptr=0x5566d0d337d2, size=1, nmemb=9, stream=0x5566d168b040)
[PDWFS][30436][TRACE][C] intercepting fwrite(ptr=0x5566d0d337d2, size=1, nmemb=9, stream=0x5566d168b040)
[PDWFS][30436][TRACE][C] intercepting fwrite(ptr=0x5566d0d337d2, size=1, nmemb=9, stream=0x5566d168b040)
[PDWFS][30436][TRACE][C] intercepting fwrite(ptr=0x5566d0d337d2, size=1, nmemb=9, stream=0x5566d168b040)
[PDWFS][30436][TRACE][C] intercepting fwrite(ptr=0x5566d0d337d2, size=1, nmemb=9, stream=0x5566d168b040)
[PDWFS][30436][TRACE][C] intercepting fwrite(ptr=0x5566d0d337d2, size=1, nmemb=9, stream=0x5566d168b040)
[PDWFS][30436][TRACE][C] intercepting fwrite(ptr=0x5566d0d337d2, size=1, nmemb=9, stream=0x5566d168b040)
[PDWFS][30436][TRACE][C] intercepting fwrite(ptr=0x5566d0d337d2, size=1, nmemb=9, stream=0x5566d168b040)
[PDWFS][30436][TRACE][C] intercepting fwrite(ptr=0x5566d0d337d2, size=1, nmemb=9, stream=0x5566d168b040)
[PDWFS][30436][TRACE][C] intercepting fwrite(ptr=0x5566d0d337d2, size=1, nmemb=9, stream=0x5566d168b040)
[PDWFS][30436][TRACE][C] intercepting fclose(stream=0x5566d168b040)
[PDWFS][30436][TRACE][C] intercepting close(fd=5)
[PDWFS][30436][TRACE][C] intercepting close(fd=5)
[PDWFS][30436][TRACE][C] calling libc close
"\x00\xc3\x11@Z\tHello444\nH\xe0E\b\x014\n\b\x00\xc4\xa2\r\xfe\x05f\xedy"
########### Launching post-process ##############
[PDWFS][30453][TRACE][C] intercepting fopen(path=staged/Cpok, mode=r)
[PDWFS][30453][TRACE][C] intercepting fclose(stream=0x564475d11040)
[PDWFS][30453][TRACE][C] intercepting close(fd=5)
[PDWFS][30453][TRACE][C] intercepting close(fd=5)
[PDWFS][30453][TRACE][C] calling libc close
post-process: 
[PDWFS][30453][TRACE][C] intercepting fopen(path=resC, mode=w)
[PDWFS][30453][TRACE][C] calling libc fopen
[PDWFS][30453][TRACE][C] intercepting fprintf(stream=0x564475d126c0, ...)
[PDWFS][30453][TRACE][C] intercepting fputs(s=
, stream=0x564475d126c0)
[PDWFS][30453][TRACE][C] calling libc fputs
[PDWFS][30453][TRACE][C] intercepting fclose(stream=0x564475d126c0)
[PDWFS][30453][TRACE][C] calling libc fclose
########### Done ##############

Anyway, I was just trying out how this works with hello world examples. But I agree, in a real code you might want to read the full buffer with a fread and parse the ACSII content in memory...

I'll have try with your branch, but first I need to compile pdwfs. Until now I made the minimum effort using the available binaries :blush:

mathaefele avatar Nov 19 '19 13:11 mathaefele

I've just tested with fread and parsing it in memory, it works !

mathaefele avatar Nov 19 '19 13:11 mathaefele

yay! I'll check what's going on with fscanf, probably a similar story as getline.

JCapul avatar Nov 19 '19 13:11 JCapul

I'll check what's going on with fscanf, probably a similar story as getline.

ok my bad, interception of fscanf was never implemented actually. I guess we never had to in applications so far.

but failing an hello world example looks pretty bad...so we'll make it work!

JCapul avatar Nov 19 '19 14:11 JCapul

ok, I managed to compile and run my hello world with the fix-github-issue-2 version of pdwfs.

  • fscanf still broken (not surprising as only getline has been fixed)
  • fread still works

Tell me when I can I have a new try.

mathaefele avatar Nov 19 '19 14:11 mathaefele

The small hello world example in C I am using hello_pdwfs_C.zip

mathaefele avatar Nov 19 '19 15:11 mathaefele