daos icon indicating copy to clipboard operation
daos copied to clipboard

DAOS-9218 test: data-mover erasure code support

Open knard38 opened this issue 3 years ago • 8 comments

Quick-Functional: true Test-tag: iosysadmin Test-repeat: 20 Signed-off-by: Cedric Koch-Hofer [email protected]

knard38 avatar Aug 17 '22 13:08 knard38

Bug-tracker data: Ticket title is 'serialization/deserialization not working with data protection' Status is 'In Review' Labels: 'tds,triaged' Job should run at elevated priority (3) https://daosio.atlassian.net/browse/DAOS-9218

github-actions[bot] avatar Aug 17 '22 13:08 github-actions[bot]

Test stage Functional Hardware Large completed with status UNSTABLE. https://build.hpdd.intel.com/job/daos-stack/job/daos//view/change-requests/job/PR-10008/2/testReport/(root)/

daosbuild1 avatar Sep 05 '22 18:09 daosbuild1

Test stage Functional Hardware Large completed with status UNSTABLE. https://build.hpdd.intel.com/job/daos-stack/job/daos//view/change-requests/job/PR-10008/3/testReport/(root)/

daosbuild1 avatar Sep 06 '22 09:09 daosbuild1

Test stage Functional Hardware Large completed with status UNSTABLE. https://build.hpdd.intel.com/job/daos-stack/job/daos//view/change-requests/job/PR-10008/8/testReport/(root)/

daosbuild1 avatar Sep 13 '22 20:09 daosbuild1

Test stage Functional Hardware Large completed with status UNSTABLE. https://build.hpdd.intel.com/job/daos-stack/job/daos//view/change-requests/job/PR-10008/9/testReport/(root)/

daosbuild1 avatar Sep 14 '22 16:09 daosbuild1

Test stage Functional Hardware Large completed with status UNSTABLE. https://build.hpdd.intel.com/job/daos-stack/job/daos//view/change-requests/job/PR-10008/11/testReport/(root)/

daosbuild1 avatar Sep 20 '22 16:09 daosbuild1

Test stage Functional Hardware Large completed with status UNSTABLE. https://build.hpdd.intel.com/job/daos-stack/job/daos//view/change-requests/job/PR-10008/12/testReport/(root)/

daosbuild1 avatar Sep 27 '22 12:09 daosbuild1

Test stage Functional Hardware Large completed with status UNSTABLE. https://build.hpdd.intel.com/job/daos-stack/job/daos//view/change-requests/job/PR-10008/13/testReport/(root)/

daosbuild1 avatar Oct 12 '22 01:10 daosbuild1

Test stage Functional Hardware Large completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-10008/17/execution/node/1139/log

daosbuild1 avatar Oct 21 '22 05:10 daosbuild1

Test stage Functional Hardware Medium completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-10008/17/execution/node/1084/log

daosbuild1 avatar Oct 21 '22 06:10 daosbuild1

Test stage Functional Hardware Medium completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-10008/18/execution/node/1083/log

daosbuild1 avatar Oct 22 '22 12:10 daosbuild1

Test stage Functional Hardware Medium completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-10008/19/execution/node/1083/log

daosbuild1 avatar Oct 25 '22 21:10 daosbuild1

Test stage Functional Hardware Medium completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-10008/20/execution/node/1130/log

daosbuild1 avatar Oct 27 '22 09:10 daosbuild1

The error of the CI should not be related with this PR. Indeed, the same error as the one reported in the ticket https://daosio.atlassian.net/browse/DAOS-11943 is happening in the latest CI build.

knard38 avatar Oct 27 '22 09:10 knard38

@liuxuezhao, @wangdi1 or @jolivier23, please could you check if the C code is OK for you? It is just missing this part of the code to be reviewed to request the landing of this prehistorical ticket ;)

knard38 avatar Nov 02 '22 10:11 knard38

This PR did not run https://build.hpdd.intel.com/job/daos-stack/job/daos/job/PR-10008/20/artifact/Functional%20Hardware%20Large/deployment/io_sys_admin.py/results.html due to DAOS-9218.

The most recent commit message needs to start with DAOS-9218 <some text>. It looks like this was missing in the last commit message. Please merge with master and include DAOS-9218 in the commit message along with the required tags.

phender avatar Nov 09 '22 21:11 phender

As suggested by @phender, I have merged with master and push to CI with a valid commit message. I also take the opportunity to add a small patch allowing to use the daos-launch.sh script with git bisect.

knard38 avatar Nov 10 '22 07:11 knard38

Test stage checkpatch completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-10008/22/execution/node/145/log

daosbuild1 avatar Nov 14 '22 08:11 daosbuild1

Test stage NLT on EL 8 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-10008/22/execution/node/638/log

daosbuild1 avatar Nov 14 '22 08:11 daosbuild1

Test stage NLT on EL 8 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-10008/23/execution/node/246/log

daosbuild1 avatar Nov 14 '22 16:11 daosbuild1

Test stage Build RPM on EL 8 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-10008/24/execution/node/333/log

daosbuild1 avatar Nov 14 '22 19:11 daosbuild1

Test stage Build RPM on Leap 15 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-10008/24/execution/node/307/log

daosbuild1 avatar Nov 14 '22 19:11 daosbuild1

Test stage Build DEB on Ubuntu 20.04 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-10008/24/execution/node/330/log

daosbuild1 avatar Nov 14 '22 19:11 daosbuild1

Test stage Build on EL 8 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-10008/24/execution/node/438/log

daosbuild1 avatar Nov 14 '22 19:11 daosbuild1

Test stage Build on Leap 15 with Intel-C and TARGET_PREFIX completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-10008/24/execution/node/477/log

daosbuild1 avatar Nov 14 '22 20:11 daosbuild1

Test stage NLT on EL 8 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-10008/25/execution/node/638/log

daosbuild1 avatar Nov 14 '22 20:11 daosbuild1

I assume you know how to read the failures from this. https://build.hpdd.intel.com/job/daos-stack/job/daos/job/PR-10008/25/valgrindResult/pid=34159,0x63/

There's a segfault of 0 on this line if (shard_arg->la_recxs[i].rx_idx & PARITY_INDICATOR)

ashleypittman avatar Nov 15 '22 09:11 ashleypittman

I assume you know how to read the failures from this. https://build.hpdd.intel.com/job/daos-stack/job/daos/job/PR-10008/25/valgrindResult/pid=34159,0x63/

There's a segfault of 0 on this line if (shard_arg->la_recxs[i].rx_idx & PARITY_INDICATOR)

I have to admit it, that it is my first time working with NLT failure. I was trying to run them locally without big success until now :'( However, with your pointers, I should be able to easily fix the issue :) Thanks a lot :)

knard38 avatar Nov 15 '22 09:11 knard38

At the end, I was able to locally run NLT tests and reproduce and to fix the NULL pointer read issue. Thanks again @ashleypittman for your help 😄

knard38 avatar Nov 15 '22 10:11 knard38

I assume you know how to read the failures from this. https://build.hpdd.intel.com/job/daos-stack/job/daos/job/PR-10008/25/valgrindResult/pid=34159,0x63/ There's a segfault of 0 on this line if (shard_arg->la_recxs[i].rx_idx & PARITY_INDICATOR)

I have to admit it, that it is my first time working with NLT failure. I was trying to run them locally without big success until now :'( However, with your pointers, I should be able to easily fix the issue :) Thanks a lot :)

This is largely about knowing how/where Jenkins reports valgrind failures but NLT is the number 1 reporter of these. When there's a crash you typically see a large number of failures reported and it can be hard to know which ones to focus on.

ashleypittman avatar Nov 15 '22 10:11 ashleypittman