mercury icon indicating copy to clipboard operation
mercury copied to clipboard

NA BMI: cannot transfer data larger than 16 MB

Open soumagne opened this issue 7 years ago • 4 comments

When transferring large data with the NA BMI plugin, Tang is reporting the following error:

# NA -- Error -- /global/homes/w/wzhang5/software/mercury/src/na/na_bmi.c:1773
[E 15:01:01.028255] src/io/bmi/bmi_tcp/bmi-tcp.c line 1313: Error: BMI message too large!
 # na_bmi_get(): BMI_post_recv() failed
# HG -- Error -- /global/homes/w/wzhang5/software/mercury/src/mercury_bulk.c:679
 # hg_bulk_transfer_pieces(): Could not transfer data
# HG -- Error -- /global/homes/w/wzhang5/software/mercury/src/mercury_bulk.c:817
 # hg_bulk_transfer(): Could not transfer data pieces
# HG -- Error -- /global/homes/w/wzhang5/software/mercury/src/mercury_bulk.c:1477
 # HG_Bulk_transfer(): Could not transfer data
Could not read bulk data
[E 15:01:01.029567]     [bt] /global/homes/w/wzhang5/software/bmi/build/lib/libbmi.so(BMI_tcp_post_send_list+0x15f) [0x2aaaab80774f]
[E 15:01:01.029603]     [bt] /global/homes/w/wzhang5/software/bmi/build/lib/libbmi.so(BMI_post_send+0x50) [0x2aaaab80d3e0]
[E 15:01:01.029607]     [bt] /global/homes/w/wzhang5/software/mercury/build/bin/libna.so.0.9.0(+0x67e8) [0x2aaaab3ef7e8]
[E 15:01:01.029610]     [bt] /global/homes/w/wzhang5/software/mercury/build/bin/libna.so.0.9.0(+0x6d1b) [0x2aaaab3efd1b]
[E 15:01:01.029613]     [bt] /global/homes/w/wzhang5/software/mercury/build/bin/libna.so.0.9.0(NA_Progress+0x254) [0x2aaaab3ed824]
[E 15:01:01.029616]     [bt] /global/homes/w/wzhang5/software/mercury/build/bin/libmercury.so.0.9.0(+0x56b9) [0x2aaaaacd36b9]
[E 15:01:01.029619]     [bt] /global/homes/w/wzhang5/software/mercury/build/bin/libmercury.so.0.9.0(HG_Core_progress+0xf) [0x2aaaaacd6e6f]
[E 15:01:01.029622]     [bt] /global/homes/w/wzhang5/software/SoMeta2/api/build/bin/pdc_server.exe(main+0x2e9) [0x404b89]
[E 15:01:01.029625]     [bt] /lib64/libc.so.6(__libc_start_main+0xf5) [0x2aaaacc7aac5]
[E 15:01:01.029628]     [bt] /global/homes/w/wzhang5/software/SoMeta2/api/build/bin/pdc_server.exe() [0x404c05]
# NA -- Error -- /global/homes/w/wzhang5/software/mercury/src/na/na_bmi.c:2154
 # na_bmi_progress_rma(): BMI_post_send() failed
# NA -- Error -- /global/homes/w/wzhang5/software/mercury/src/na/na_bmi.c:1892
 # na_bmi_progress_unexpected(): Could not make RMA progress
# NA -- Error -- /global/homes/w/wzhang5/software/mercury/src/na/na_bmi.c:1812
 # na_bmi_progress(): Could not make unexpected progress
# HG -- Error -- /global/homes/w/wzhang5/software/mercury/src/mercury_core.c:2143
 # hg_core_progress_na(): Could not make NA Progress
# HG -- Error -- /global/homes/w/wzhang5/software/mercury/src/mercury_core.c:3489
 # HG_Core_progress(): Could not make progress
# NA -- Error -- /global/homes/w/wzhang5/software/mercury/src/na/na_bmi.c:1773
 # na_bmi_get(): BMI_post_recv() failed

There seems to be a TCP_MODE_REND_LIMIT limit set to 16M in src/io/bmi/bmi_tcp/bmi-tcp.c

soumagne avatar Aug 03 '17 16:08 soumagne

@carns Phil, are you aware of that limit?

soumagne avatar Aug 03 '17 16:08 soumagne

Yes, that's right unfortunately. The reason there is a limit at all (conceptually) is that it constrains the amount of data that will be streamed in a socket between control headers. If it is arbitrarily large, then other messages that you would like to send over the socket will be starved.

It doesn't matter for memory usage though, since as the #define name implies this only affects rendezvous mode.

In PVFS we didn't hit this limit, because PVFS itself would chunk up data in to smaller units before issuing BMI operations. Does Mercury have the ability to do that on a bulk transfer by any chance?

carns avatar Aug 04 '17 20:08 carns

OK yes we should be able to do that, either at the HG bulk level by returning the number of bytes transmitted or at the NA plugin level directly

soumagne avatar Aug 04 '17 21:08 soumagne

A short term workaround would be to crank up that #define if we don't have chunking capability yet :) BMI should technically work with a larger limit, and will actually perform Ok too until you have multiple transfers on the same address pair simultaneously.

carns avatar Aug 04 '17 21:08 carns