clp icon indicating copy to clipboard operation
clp copied to clipboard

core-clp: Refactor `LibarchiveFileReader` to properly handle empty data blocks (fixes #389).

Open LinZhihao-723 opened this issue 1 year ago • 3 comments

Description

The current LibarchiveFileReader implementation has the following problems:

  • It doesn't handle the case where the data block is empty, which is the root cause of #389
  • Not all data members are initialized in the default constructor
  • Buffer peeking method's signature can be improved with C++20 enabled

This PR makes the following changes accordingly:

  • Handle the empty data block properly. All empty blocks will be skipped and only nonempty blocks can be read
  • Initialize all data members in the member declaration (which matches our latest coding guideline)
  • Using std::span for peeking buffer content

Validation performed

  • Ensure CI passed
  • Ensure the bug has been fixed in #389

LinZhihao-723 avatar Jun 21 '24 05:06 LinZhihao-723

A high level comment: Methodlogy wise, I feel the rest of our code tend to let caller check if the data_block is empty, rather having a function to read the next 'none-empty' data block.

That said, I don't see a nit way of doing it unless we let try_load_data_block return a new type of error code and let the caller check it.

One option I might consider is to keep the private method read_next_data_block the unchanged, but add the while loop in try_load_data_block(). This is really a nit though

haiqi96 avatar Jun 25 '24 15:06 haiqi96

A high level comment: Methodlogy wise, I feel the rest of our code tend to let caller check if the data_block is empty, rather having a function to read the next 'none-empty' data block.

That said, I don't see a nit way of doing it unless we let try_load_data_block return a new type of error code and let the caller check it.

One option I might consider is to keep the private method read_next_data_block the unchanged, but add the while loop in try_load_data_block(). This is really a nit though

The whole reason why we need this fix is because we don't have any high level code handling empty blocks. And I don't think the high level code should handle empty blocks in this case since the empty blocks should be hidden from the caller. This class is to provide abstraction to read data using libarchive, so the upper level caller shouldn't really care how data blocks are maintained (like there could be an empty data block) in different archives

LinZhihao-723 avatar Jun 26 '24 09:06 LinZhihao-723

A high level comment: Methodlogy wise, I feel the rest of our code tend to let caller check if the data_block is empty, rather having a function to read the next 'none-empty' data block. That said, I don't see a nit way of doing it unless we let try_load_data_block return a new type of error code and let the caller check it. One option I might consider is to keep the private method read_next_data_block the unchanged, but add the while loop in try_load_data_block(). This is really a nit though

The whole reason why we need this fix is because we don't have any high level code handling empty blocks. And I don't think the high level code should handle empty blocks in this case since the empty blocks should be hidden from the caller. This class is to provide abstraction to read data using libarchive, so the upper level caller shouldn't really care how data blocks are maintained (like there could be an empty data block) in different archives

A high level comment: Methodlogy wise, I feel the rest of our code tend to let caller check if the data_block is empty, rather having a function to read the next 'none-empty' data block. That said, I don't see a nit way of doing it unless we let try_load_data_block return a new type of error code and let the caller check it. One option I might consider is to keep the private method read_next_data_block the unchanged, but add the while loop in try_load_data_block(). This is really a nit though

The whole reason why we need this fix is because we don't have any high level code handling empty blocks. And I don't think the high level code should handle empty blocks in this case since the empty blocks should be hidden from the caller. This class is to provide abstraction to read data using libarchive, so the upper level caller shouldn't really care how data blocks are maintained (like there could be an empty data block) in different archives

Perhaps I didn't make it clear, the caller here I was suggesting was the caller of read_next_data_block, so it's still within libarchiveReader.

That said, what I was really suggesting was to hide the "non_empty" details from the caller of read_next_data_block.. Instead, the read_next_data_block can handle the empty_block with a while loop. Unless we see a future need to read an empty block, otherwise I don't think we need to expose the details all the way up to try_load_nonempty_data_block, which is called by FileCompessor. Anyway, it's my personal preference and I will let you decide.

haiqi96 avatar Jun 26 '24 14:06 haiqi96

Anyway, it's my personal preference and I will let you decide.

Remind me, Zhihao, did we reach a conclusion on this?

Also, can you resolve the conflicts?

kirkrodrigues avatar Nov 13 '24 12:11 kirkrodrigues

Anyway, it's my personal preference and I will let you decide.

Remind me, Zhihao, did we reach a conclusion on this?

Also, can you resolve the conflicts?

No, we didn't reach a conclusion. I will resolve the conflicts soon

LinZhihao-723 avatar Nov 14 '24 02:11 LinZhihao-723