doris icon indicating copy to clipboard operation
doris copied to clipboard

[optimize](cooldown) check remote meta path exists before trying to follow cooldowned data

Open DarvenDuan opened this issue 9 months ago • 4 comments

Proposed changes

Issue Number: close #xxx If we set a storage policy for a tablet, doris will choose a replica to cooldown, and other replicas will follow it, but the chose replica may have not cooldowned yet before following. so doris will get exception like this:

W0531 13:28:06.202108 367095 file_system.cpp:34] [IO_ERROR]failed to get file size xxx/136930872/140650777.0.meta, (endpoint: http://xxx, bucket: xxx, key:xxx/136930872/140650777.0.meta, ), No response body., error code 404, request id

	0#  doris::io::S3FileSystem::file_size_impl(std::filesystem::__cxx11::path const&, long*) const at /var/local/ldb-toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/basic_string.h:187
	1#  doris::io::S3FileSystem::open_file_internal(doris::io::FileDescription const&, std::filesystem::__cxx11::path const&, std::shared_ptr<doris::io::FileReader>*) at /root/jdolap-engine/be/src/common/status.h:446
	2#  doris::io::RemoteFileSystem::open_file_impl(doris::io::FileDescription const&, std::filesystem::__cxx11::path const&, doris::io::FileReaderOptions const&, std::shared_ptr<doris::io::FileReader>*) at /root/jdolap-engine/be/src/common/status.h:446
	3#  doris::io::FileSystem::open_file(doris::io::FileDescription const&, doris::io::FileReaderOptions const&, std::shared_ptr<doris::io::FileReader>*) at /root/jdolap-engine/be/src/common/status.h:357
	4#  doris::Tablet::_read_cooldown_meta(std::shared_ptr<doris::io::RemoteFileSystem> const&, doris::TabletMetaPB*) at /var/local/ldb-toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/basic_string.h:187
	5#  doris::Tablet::_follow_cooldowned_data() at /root/jdolap-engine/be/src/common/status.h:446
	6#  doris::Tablet::cooldown() at /root/jdolap-engine/be/src/common/status.h:446
	7#  std::_Function_handler<void (), doris::StorageEngine::_cooldown_tasks_producer_callback()::$_1>::_M_invoke(std::_Any_data const&) at /root/jdolap-engine/be/src/olap/olap_server.cpp:1076
	8#  doris::WorkThreadPool<true>::work_thread(int) at /var/local/ldb-toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/atomic_base.h:646
	9#  execute_native_thread_routine at /data/gcc-11.1.0/build/x86_64-pc-linux-gnu/libstdc++-v3/include/bits/unique_ptr.h:85
	10# start_thread
	11# clone
W0531 13:28:06.202123 367095 olap_server.cpp:1080] failed to cooldown, tablet: 136930872 err: [INTERNAL_ERROR]cannot read cooldown meta

optimize: check if remote tablet meta path exits before opening

DarvenDuan avatar May 31 '24 07:05 DarvenDuan