ucx icon indicating copy to clipboard operation
ucx copied to clipboard

UCT/API: Removed count and stride fields from uct_iov_t structure.

Open rakhmets opened this issue 4 years ago • 21 comments

What

Removed count and stride fields from uct_iov_t structure.

Why ?

The fields are not used in current implementation.

rakhmets avatar Dec 03 '20 12:12 rakhmets

@rakhmets - since the API is installed by default, we are at risk here to break 3rd party projects. This is just one example that is available in public: https://github.com/open-mpi/ompi/blob/a541ab933dec0bb6f1572a9aa5c4ffa04d20e1e8/opal/mca/btl/uct/btl_uct_rdma.c#L105

AFAIK there are other 3rd party projects as UCG as well non-public projects.

shamisp avatar Dec 03 '20 14:12 shamisp

@rakhmets - since the API is installed by default, we are at risk here to break 3rd party projects. This is just one example that is available in public: https://github.com/open-mpi/ompi/blob/a541ab933dec0bb6f1572a9aa5c4ffa04d20e1e8/opal/mca/btl/uct/btl_uct_rdma.c#L105

AFAIK there are other 3rd party projects as UCG as well non-public projects.

@shamisp UCT API not guaranteed to be stable. If we want to stabilize it at some point we need to do cleanups, like this one.

yosefe avatar Dec 03 '20 14:12 yosefe

@yosefe since it is publicity available and already used by external project we cannot state "API not guaranteed to be stable" without hurting project reputation (we learned it hard way).The project reputation has higher priority than cleanup.

shamisp avatar Dec 03 '20 14:12 shamisp

@yosefe since it is publicity available and already used by external project we cannot state "API not guaranteed to be stable" without hurting project reputation (we learned it hard way).The project reputation has higher priority than cleanup.

@shamisp - btl_uct layer has autotools detection for uct/ucx API version to call the right APIs. we should make sure to notify all known users of UCT API in advance on change so they can make arrangements.

We should keep flexibility for UCT API and mind respective UCT users with advance notification on change.

mike-dubman avatar Dec 03 '20 15:12 mike-dubman

@mike-dubman For binary distribution autotools is not going to help. There are other projects using UCT. Once the header installed the cats are out of bag. We have the flexibility to extend the API. We don't the flexibility to break existing APIs. The cleanup just does not worth the reputation damage that it will cause to the project and community . I hardly see any benefits in removing strides.

shamisp avatar Dec 03 '20 15:12 shamisp

@mike-dubman For binary distribution autotools is not going to help. There are other projects using UCT. Once the header installed the cats are out of bag. We have the flexibility to extend the API. We don't the flexibility to break existing APIs. The cleanup just does not worth the reputation damage that it will cause to the project and community . I hardly see any benefits in removing strides.

for binary dist, projects relying on UCT and autotools (1 known out there) checks will not fail and detect incompat and warn user, which is standard way. we should keep UCT flex as it was defined, if there is a wish to change that - call a meeting and lets discuss.

mike-dubman avatar Dec 03 '20 16:12 mike-dubman

@mike-dubman current status quo for UCT - we can extend it but we don't break API. There is no change in existing policy. In this particular case benefits for the break are not clear. On the other hand the risk of reputation damage is quite substantial.

shamisp avatar Dec 03 '20 16:12 shamisp

@mike-dubman current status quo for UCT - we can extend it but we don't break API. There is no change in existing policy. In this particular case benefits for the break are not clear. On the other hand the risk of reputation damage is quite substantial.

it is new to me and disagree. call a meeting. UCT API is flex and not a subject to status quo.

mike-dubman avatar Dec 03 '20 16:12 mike-dubman

Please find performance test results showing the impact of removing the fields from the structure.

iovcnt Avg Latency v0*, ns Avg Latency v1**, ns Diff, ns*** Diff, %**** Speed-up
2 518.4 511.8 6.6 1.27% 1.013
3 544.0 539.8 4.2 0.77% 1.008
4 562.6 556.2 6.4 1.14% 1.012

* - UCX version from master branch ** - the same as v0, but stride and count fields are removed from uct_iov_t structure *** - difference between Avg Latency v0 and Avg Latency v1 **** - difference between Avg Latency v0 and Avg Latency v1 in percentage of Avg Latency v0

The following command used for testing: $ ucx_perftest -d memory -x sysv -D shortiov -t am_lat -s size -n 100000000 , where size - [ 8,8 | 8,8,8 | 8,8,8,8 ]

Tests were performed on arm target. UCX was compiled in release mode using gcc.

The content of cpuinfo: BogoMIPS : 400.00 Features : fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics cpuid asimdrdm CPU implementer : 0x43 CPU architecture : 8 CPU variant : 0x1 CPU part : 0x0af CPU revision : 1

rakhmets avatar Dec 30 '20 10:12 rakhmets

bot:pipe:retest

yosefe avatar Dec 30 '20 11:12 yosefe

@shamisp ping

yosefe avatar Jan 07 '21 08:01 yosefe

@yosefe Just saw this. Can you share the v0 code ? Seems like this is Thunderx-2. You have to pin processes on this platform otherwise it is very noisy. Is there any variance between the runs ? How does it look on x86 ?

shamisp avatar Jan 07 '21 22:01 shamisp

FYI, my command line does not accept this size format.

shamisp avatar Jan 08 '21 00:01 shamisp

@yosefe Just saw this. Can you share the v0 code ? Seems like this is Thunderx-2. You have to pin processes on this platform otherwise it is very noisy. Is there any variance between the runs ? How does it look on x86 ?

@shamisp Each average value was based on 5 runs with the same parameters. The deviation was from 1.14 to 7.7 for different set of experiments. The same set of experiments was performed on x86 as well. The difference on x86 was roughly the same. However the difference between two versions was lower than the deviation of values of the same version.

rakhmets avatar Jan 12 '21 07:01 rakhmets

FYI, my command line does not accept this size format.

@shamisp Yes, I forgot about this. Sorry for the inconvenience. For quick workaround please replace max_iov = attr.cap.am.max_iov; by smth like max_iov = SIZE_MAX; in src/tools/perf/lib/libperf.c.

rakhmets avatar Jan 12 '21 08:01 rakhmets

bot:pipe:retest

yosefe avatar Jan 13 '21 10:01 yosefe

@rakhmets can you pls fix test issue:

[2021-01-13T11:10:18.228Z] ../../../../test/gtest/uct/test_p2p_mix.cc: In member function ‘ucs_status_t uct_p2p_mix_test::am_short_iov(const uct_test::mapped_buffer&, const uct_test::mapped_buffer&, uct_completion_t*)’:
[2021-01-13T11:10:18.228Z] ../../../../test/gtest/uct/test_p2p_mix.cc:100:9: error: ‘uct_iov_t’ {aka ‘struct uct_iov’} has no member named ‘count’
[2021-01-13T11:10:18.228Z]   100 |     iov.count  = 1;
[2021-01-13T11:10:18.228Z]       |         ^~~~~
[2021-01-13T11:10:18.228Z] ../../../../test/gtest/uct/test_p2p_mix.cc:101:9: error: ‘uct_iov_t’ {aka ‘struct uct_iov’} has no member named ‘stride’
[2021-01-13T11:10:18.228Z]   101 |     iov.stride = 0;
[2021-01-13T11:10:18.228Z]       |         ^~~~~~
[2021-01-13T11:10:18.228Z]   CXX      uct/gtest-test_p2p_rma.o

yosefe avatar Jan 13 '21 12:01 yosefe

@rakhmets can you please tree/branches for v0/v1 ? thanks.

shamisp avatar Jan 13 '21 16:01 shamisp

@rakhmets can you please tree/branches for v0/v1 ? thanks.

@shamisp v0 was mid-december master branch. And v1 was v0 + changes from this pull request. I think it is more convenient to use current state of ucx from master branch as base version because it contains recently merged pull request. The changes in the pull request allows you to pass X,Y,... as size to ucx_perftest.

rakhmets avatar Jan 14 '21 17:01 rakhmets

bot:pipe:retest

rakhmets avatar Jan 19 '21 06:01 rakhmets

will be revised for uct_v2

yosefe avatar Feb 25 '21 17:02 yosefe