alibabacloud-jindodata icon indicating copy to clipboard operation
alibabacloud-jindodata copied to clipboard

希望增加对阿里云日志服务(SLS)投递到OSS的snappy压缩文档的支持

Open cfgxy opened this issue 3 years ago • 2 comments

目前对hadoop-snappy的支持是正常的。 SLS投递到OSS的snappy压缩文档好像不是 hadoop-snappy ; 二进制对比发现SLS投递的snappy文档头部比正常的hadoop-snappy少了几个字节,snzip 工具需要添加参数 -t raw才能正常解压缩。 阿里云自己的生态链下,对这种格式添加支持应该是比较合理的。

附snzip supported formats列表:

snzip 1.0.4

  Usage: snzip [option ...] [file ...]

  general options:
   -c       output to standard output, keep original files unchanged
   -d       decompress
   -k       keep (don't delete) input files
   -t name  file format name. see below. The default format is framing2.
   -h       give this help

  raw_format option:
   -s size  size of input data when compressing.
            The default value is the file size if available.

  tuning options:
   -b num   internal block size in bytes
   -B num   internal block size. 'num'-th power of two.
   -R num   size of read buffer in bytes
   -W num   size of write buffer in bytes
   -T       trace for debug

  supported formats:
    NAME            SUFFIX  URL
    ----            ------  ---
    framing2        sz      https://github.com/google/snappy/blob/master/framing_format.txt
    hadoop-snappy   snappy  https://code.google.com/p/hadoop-snappy/
    raw             raw     https://github.com/google/snappy/blob/master/format_description.txt
    iwa             iwa     https://github.com/obriensp/iWorkFileFormat/blob/master/Docs/index.md#snappy-compression
    framing         sz      https://github.com/google/snappy/blob/0755c815197dacc77d8971ae917c86d7aa96bf8e/framing_format.txt
    snzip           snz     https://github.com/kubo/snzip
    snappy-java     snappy  https://github.com/xerial/snappy-java
    snappy-in-java  snappy  https://github.com/dain/snappy
    comment-43      snappy  http://code.google.com/p/snappy/issues/detail?id=34#c43

cfgxy avatar Jul 05 '21 03:07 cfgxy

这个和jindofs关系不大,使用emr-hadoop可以解决你的问题

adrian-wang avatar Jul 05 '21 08:07 adrian-wang

》阿里云自己的生态链下,对这种格式添加支持应该是比较合理的。

这个能 clarify 一下吗?比如具体需要 JindoFS SDK 对 OSS 这部分格式数据提供什么样的支持?

drankye avatar Jul 07 '21 11:07 drankye