DML icon indicating copy to clipboard operation
DML copied to clipboard

Unknown buffer size limitation for CRC operation

Open bartlomiejgrzeskowiak opened this issue 2 years ago • 8 comments

What is the acceptable input buffer size for CRC operation ?

I play with different sizes of CRC buffer. DML Lib does accept different sizes, but it behaves with error or even segmentation fault in some cases.

Example execution:

[bgrzesko@fl31ca105bs0411 build]$ ./examples/low-level-api/ll_crc_example_1KB hardware_path
The example will be run on the hardware path.
Starting CRC job example.
Caclulating CRC for region of size 1KB.
Calculated CRC is: 0x2cdf6e8f
Finished successfully.
[bgrzesko@fl31ca105bs0411 build]$ ./examples/low-level-api/ll_crc_example_4MB hardware_path
The example will be run on the hardware path.
Starting CRC job example.
Caclulating CRC for region of size 4MB.
An error (15) occured during job execution.
[bgrzesko@fl31ca105bs0411 build]$ ./examples/low-level-api/ll_crc_example_16MB hardware_path
Segmentation fault (core dumped)
[bgrzesko@fl31ca105bs0411 build]$ 

How to reproduce (diff -> apply and compile example) :

[bgrzesko@fl31ca105bs0411 build]$ git diff
diff --git a/examples/low-level-api/crc_example.c b/examples/low-level-api/crc_example.c
index 3c12df2..ad03704 100644
--- a/examples/low-level-api/crc_example.c
+++ b/examples/low-level-api/crc_example.c
@@ -9,7 +9,8 @@
 #include "dml/dml.h"
 #include "examples_utils.h"
 
-#define BUFFER_SIZE 1024 // 1 KB
+//#define BUFFER_SIZE 4 * 1024 * 1024 // 4 MB
+#define BUFFER_SIZE 16 * 1024 * 1024 // 16 MB
 
 /*
 * This example demonstrates how to create and run a crc operation.

bartlomiejgrzeskowiak avatar Oct 03 '23 10:10 bartlomiejgrzeskowiak

Hi @bartlomiejgrzeskowiak, This is for hardware_path only, correct? Could you please check what is the max_transfer_size setting that you have set for the WQ? (accel-config list | grep transfer)

mzhukova avatar Oct 09 '23 21:10 mzhukova

Hi @bartlomiejgrzeskowiak, This is for hardware_path only, correct? It is most probably 'software path since'. I am using crc_example.c where DML_PATH_SW is set. https://github.com/intel/DML/blob/6d71051c405c2318d06aad96d3b0244ce8c4bcbe/examples/low-level-api/crc_example.c#L20

Could you please check what is the max_transfer_size setting that you have set for the WQ? (accel-config list | grep transfer)

[bgrzesko@fl31ca105bs0411 ~]$ accel-config list | grep transfer
    "max_transfer_size":2147483648,
            "max_transfer_size":2097152,
            "max_transfer_size":2097152,
            "max_transfer_size":2097152,
            "max_transfer_size":2097152,
            "max_transfer_size":2097152,
            "max_transfer_size":2097152,
            "max_transfer_size":2097152,
            "max_transfer_size":2097152,
    "max_transfer_size":2147483648,
            "max_transfer_size":2097152,
            "max_transfer_size":2097152,
            "max_transfer_size":2097152,
            "max_transfer_size":2097152,
            "max_transfer_size":2097152,
            "max_transfer_size":2097152,
            "max_transfer_size":2097152,
            "max_transfer_size":2097152,

bartlomiejgrzeskowiak avatar Oct 10 '23 08:10 bartlomiejgrzeskowiak

hi @bartlomiejgrzeskowiak, sorry I was not clear, I believe you were running with hardware_path (meaning using DSA for execution), since the line is [bgrzesko@fl31ca105bs0411 build]$ ./examples/low-level-api/ll_crc_example_4MB hardware_path, I was just trying to double check whether you observed the similar error on software_path/DML_PATH_SW as well.

Let me try to reproduce on my side and I'll get back to you.

mzhukova avatar Oct 10 '23 16:10 mzhukova

hi @bartlomiejgrzeskowiak, sorry I was not clear, I believe you were running with hardware_path (meaning using DSA for execution), since the line is [bgrzesko@fl31ca105bs0411 build]$ ./examples/low-level-api/ll_crc_example_4MB hardware_path, I was just trying to double check whether you observed the similar error on software_path/DML_PATH_SW as well.

Let me try to reproduce on my side and I'll get back to you.

Hi @mzhukova ,

You're totally right. I was executing HW PATH. Sorry for misleading you, it was some time ago and I did not noticed that the argument does overwrite the path.

Please let me know if you're able to reproduce the issue.,

BR Bartek

bartlomiejgrzeskowiak avatar Oct 11 '23 13:10 bartlomiejgrzeskowiak

Hey @bartlomiejgrzeskowiak , 16 MB is too large to be allocated on the stack. If you wanted to use a 16MB example, you would need to use malloc()

Simple godbolt example for large allocation: https://godbolt.org/z/Ts9xndqcq Quick reference I found for size of stack on linux being somewhere between 8-10MB: https://unix.stackexchange.com/questions/473416/why-on-modern-linux-the-default-stack-size-is-so-huge-8mb-even-10-on-some-di

abdelrahim-hentabli avatar Oct 11 '23 18:10 abdelrahim-hentabli

Hi @bartlomiejgrzeskowiak, This is for hardware_path only, correct? It is most probably 'software path since'. I am using crc_example.c where DML_PATH_SW is set. https://github.com/intel/DML/blob/6d71051c405c2318d06aad96d3b0244ce8c4bcbe/examples/low-level-api/crc_example.c#L20

Could you please check what is the max_transfer_size setting that you have set for the WQ? (accel-config list | grep transfer)

[bgrzesko@fl31ca105bs0411 ~]$ accel-config list | grep transfer
    "max_transfer_size":2147483648,
            "max_transfer_size":2097152,
            "max_transfer_size":2097152,
            "max_transfer_size":2097152,
            "max_transfer_size":2097152,
            "max_transfer_size":2097152,
            "max_transfer_size":2097152,
            "max_transfer_size":2097152,
            "max_transfer_size":2097152,
    "max_transfer_size":2147483648,
            "max_transfer_size":2097152,
            "max_transfer_size":2097152,
            "max_transfer_size":2097152,
            "max_transfer_size":2097152,
            "max_transfer_size":2097152,
            "max_transfer_size":2097152,
            "max_transfer_size":2097152,
            "max_transfer_size":2097152,

Hey @bartlomiejgrzeskowiak , it seems that your workqueue's max_transfer_size is 2 MB (2097152 bytes), which would explain the 4 MB example issue

abdelrahim-hentabli avatar Oct 11 '23 22:10 abdelrahim-hentabli

Hi @abdelrahim-hentabli ,

Ok, but:

  1. Max_transfer_size can be configured by system admin, so I might not know it by heart. How can I get this value in my code ? Which API function does return max_transfer_size ?
  2. What about 16MB ? The lib or example should never crash I suppose ?

bartlomiejgrzeskowiak avatar Oct 12 '23 06:10 bartlomiejgrzeskowiak

Hey @bartlomiejgrzeskowiak

  1. Currently DML does not have an API to get the max_transfer_size. You would need to use libaccel-config's API to get these values accfg_wq_get_max_transfer_size()
  2. Please see my comment from above: https://github.com/intel/DML/issues/36#issuecomment-1758324846

abdelrahim-hentabli avatar Oct 12 '23 17:10 abdelrahim-hentabli