fst icon indicating copy to clipboard operation
fst copied to clipboard

Big-endian seems to work: maybe remove misleading requirement on CRAN?

Open barracuda156 opened this issue 2 years ago • 3 comments

CRAN lists little-endian as a requirement. Why is it so? What may be needed to add big-endian support?

P. S. fstlib claims that it can be compiled on all major platforms, and zstd and lz4 certainly build and work fine on Big-endian platforms. @MarcusKlik Could you please comment on this?

barracuda156 avatar Mar 06 '23 07:03 barracuda156

Hmm, it seems to work fine on ppc:


R version 4.2.3 (2023-03-15) -- "Shortstop Beagle"
Copyright (C) 2023 The R Foundation for Statistical Computing
Platform: powerpc-apple-darwin10.8.0 (32-bit)

> 
> # required packages
> library(testthat)
> library(fstcore)
> library(lintr)
> 
> # run tests
> test_check("fstcore")
[ FAIL 0 | WARN 0 | SKIP 2 | PASS 11 ]

══ Skipped tests ═══════════════════════════════════════════════════════════════
• On CRAN (2)

[ FAIL 0 | WARN 0 | SKIP 2 | PASS 11 ]
> 
> proc.time()
   user  system elapsed 
  4.561   0.287   4.878 

barracuda156 avatar Apr 09 '23 10:04 barracuda156

Everything works:

R version 4.2.3 (2023-03-15) -- "Shortstop Beagle"
Copyright (C) 2023 The R Foundation for Statistical Computing
Platform: powerpc-apple-darwin10.8.0 (32-bit)


> 
> # required packages
> library(testthat)
> library(fst)
> 
> # run tests
> test_check("fst")
[ FAIL 0 | WARN 0 | SKIP 1 | PASS 823 ]

══ Skipped tests ═══════════════════════════════════════════════════════════════
• On CRAN (1)

[ FAIL 0 | WARN 0 | SKIP 1 | PASS 823 ]
> 
> proc.time()
   user  system elapsed 
 99.967   1.810 101.811 

barracuda156 avatar Apr 09 '23 11:04 barracuda156

Hi @barracuda156, you're absolutely right, Big-endian reads and writes work correctly if done on the same system.

The problem is when you transport the resulting fst files from a Big-endian system to a Little-endian system or visa versa. In that case, integer reads can mixed up for example because the byte-orders are reversed. This problem can be solved with an adjustment to the reading algorithm for those cases if there would be a need for that.

So the LZ4 and ZSTD compression used by fst is already Endian-safe (see the block format description) but the read from the decompressed integers to memory is not...

MarcusKlik avatar Dec 01 '23 09:12 MarcusKlik