doris icon indicating copy to clipboard operation
doris copied to clipboard

branch-3.1:[fix](broker load)fix broker load fail when <column from path> column already exists in file. (#58579)

Open hubgeter opened this issue 2 weeks ago • 4 comments

bp #58579

Problem Summary: fix :

LOAD LABEL labelx
(
    DATA INFILE("s3://bucket/load/k1=10/k3=30/test.parquet")
    INTO TABLE tableName
    FORMAT AS "parquet"
    (k2)
    COLUMNS FROM PATH AS (k1,k3)
)
WITH S3
( xxx)

and k1 or k3 column exists in test.parquet file.

error:

type:LOAD_RUN_FAIL; msg:errCode = 2, detailMessage = (127.0.0.1)[E-7412]
assert cast err:[E-7412] Bad cast from type:doris::vectorized::ColumnVector<(doris::PrimitiveType)5> to doris::vectorized::ColumnStr<unsigned int>

        0#  doris::Exception::Exception(int, std::basic_string_view<char, std::char_traits<char> > const&, bool) at /mnt/disk1/changyuwei/ldb_toolchain/bin/../lib/gcc/x86_64-pc-linux-gnu/15/include/g++-v15/bits/unique_ptr.h:193
        1#  doris::Exception::Exception(doris::Status const&) at /mnt/disk1/changyuwei/doris/be/src/common/exception.h:39
        2#  doris::vectorized::ColumnStr<unsigned int>& assert_cast<doris::vectorized::ColumnStr<unsigned int>&, (TypeCheckOnRelease)1, doris::vectorized::IColumn&>(doris::vectorized::IColumn&)::{lambda(auto:1&&)#1}::operator()<doris::vectorized::IColumn&>(doris::vectorized::IColumn&) const at /mnt/disk1/changyuwei/doris/be/src/vec/common/assert_cast.h:0
        3#  doris::vectorized::ColumnStr<unsigned int>& assert_cast<doris::vectorized::ColumnStr<unsigned int>&, (TypeCheckOnRelease)1, doris::vectorized::IColumn&>(doris::vectorized::IColumn&) at /mnt/disk1/changyuwei/doris/be/src/vec/common/assert_cast.h:72
        4#  doris::vectorized::DataTypeStringSerDeBase<doris::vectorized::ColumnStr<unsigned int> >::deserialize_one_cell_from_json(doris::vectorized::IColumn&, doris::Slice&, doris::vectorized::DataTypeSerDe::FormatOptions const&) const at /mnt/disk1/changyuwei/doris/be/src/vec/data_types/serde/data_type_string_serde.cpp:108
        5#  doris::vectorized::DataTypeNullableSerDe::deserialize_one_cell_from_json(doris::vectorized::IColumn&, doris::Slice&, doris::vectorized::DataTypeSerDe::FormatOptions const&) const at /mnt/disk1/changyuwei/doris/be/src/common/status.h:525
        6#  doris::vectorized::DataTypeNullableSerDe::deserialize_column_from_fixed_json(doris::vectorized::IColumn&, doris::Slice&, unsigned long, unsigned long*, doris::vectorized::DataTypeSerDe::FormatOptions const&) const at /mnt/disk1/changyuwei/doris/be/src/common/status.h:525
        7#  doris::vectorized::RowGroupReader::_fill_partition_columns(doris::vectorized::Block*, unsigned long, std::unordered_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::tuple<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, doris::SlotDescriptor const*>, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, std::tuple<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, doris::SlotDescriptor const*> > > > const&) at /mnt/disk1/changyuwei/doris/be/src/common/status.h:567
        8#  doris::vectorized::RowGroupReader::next_batch(doris::vectorized::Block*, unsigned long, unsigned long*, bool*) at /mnt/disk1/changyuwei/doris/be/src/common/status.h:525
        9#  doris::vectorized::ParquetReader::get_next_block(doris::vectorized::Block*, unsigned long*, bool*) at /mnt/disk1/changyuwei/doris/be/src/common/status.h:520
        10# doris::vectorized::FileScanner::_get_block_wrapped(doris::RuntimeState*, doris::vectorized::Block*, bool*) at /mnt/disk1/changyuwei/doris/be/src/common/status.h:525
        11# doris::vectorized::FileScanner::_get_block_impl(doris::RuntimeState*, doris::vectorized::Block*, bool*) at /mnt/disk1/changyuwei/doris/be/src/common/status.h:525
        12# doris::vectorized::Scanner::get_block(doris::RuntimeState*, doris::vectorized::Block*, bool*) at /mnt/disk1/changyuwei/doris/be/src/common/status.h:525
        13# doris::vectorized::Scanner::get_block_after_projects(doris::RuntimeState*, doris::vectorized::Block*, bool*) at /mnt/disk1/changyuwei/doris/be/src/vec/exec/scan/scanner.cpp:87
        14# doris::vectorized::ScannerScheduler::_scanner_scan(std::shared_ptr<doris::vectorized::ScannerContext>, std::shared_ptr<doris::vectorized::ScanTask>) at /mnt/disk1/changyuwei/doris/be/src/vec/exec/scan/scanner_scheduler.cpp:182

What problem does this PR solve?

Issue Number: close #xxx

Related PR: #xxx

Problem Summary:

Release note

None

Check List (For Author)

  • Test

    • [ ] Regression test
    • [ ] Unit Test
    • [ ] Manual test (add detailed scripts or steps below)
    • [ ] No need to test or manual test. Explain why:
      • [ ] This is a refactor/code format and no logic has been changed.
      • [ ] Previous test can cover this change.
      • [ ] No code files have been changed.
      • [ ] Other reason
  • Behavior changed:

    • [ ] No.
    • [ ] Yes.
  • Does this need documentation?

    • [ ] No.
    • [ ] Yes.

Check List (For Reviewer who merge this PR)

  • [ ] Confirm the release note
  • [ ] Confirm test cases
  • [ ] Confirm document
  • [ ] Add branch pick label

hubgeter avatar Dec 10 '25 08:12 hubgeter

Thank you for your contribution to Apache Doris. Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

hello-stephen avatar Dec 10 '25 08:12 hello-stephen

run buildall

hubgeter avatar Dec 10 '25 08:12 hubgeter

BE UT Coverage Report

Increment line coverage 0.00% (0/5) :tada:

Increment coverage report Complete coverage report

Category Coverage
Function Coverage 45.81% (12956/28284)
Line Coverage 36.67% (116007/316355)
Region Coverage 34.21% (66105/193252)
Branch Coverage 31.23% (34718/111156)

doris-robot avatar Dec 10 '25 10:12 doris-robot

BE Regression && UT Coverage Report

Increment line coverage 100.00% (5/5) :tada:

Increment coverage report Complete coverage report

Category Coverage
Function Coverage 76.43% (21264/27820)
Line Coverage 69.68% (220099/315855)
Region Coverage 67.64% (131347/194199)
Branch Coverage 61.31% (68512/111744)

hello-stephen avatar Dec 10 '25 11:12 hello-stephen