Zihao Ye
Zihao Ye
Some suggestions: 1. Official support of half precision (user should find it in pip wheels, rather than compile the library by themselves). 2. For PyTorch backends, we should have better...
I encountered a similar issue a year ao, I remember that cusparse fp16 requires some alignment (e.g. the array pointer address should be multiple of 16/32/64, etc) to be efficient.
@yaox12 I just found a code snippet about the alignment issue written by @nv-dlasalle in the last year: https://github.com/nv-dlasalle/dgl/commit/5fc6e9bfc5fbd59e0cf5dbc4510883a5a124a467 The matrix column needs to be aligned to 128 bytes for...
Not only about feature size you should also check the pointer address of each operand (A, B and C). To the best of my knowledge, achieving such performance on Reddit...
Sputnik also has some issues about alignment for fp16: https://github.com/facebookresearch/xformers/issues/15
How much effort do we need to rewrite all appearance of `->ctx`?
@lipingcoding what doo you mean by performance? speed or final accuracy?
I'm a graduate researcher at UW and had been a full-time SDE at AWS AI for years, my work is around Deep Learning Frameworks/Compilers. I feel like all of us...
I can confirm the problem still exists after I upgrade dgl to v0.9.1.
> The precision interval for fp16 between 2048 and 4096 is 2. https://en.wikipedia.org/wiki/Half-precision_floating-point_format. Considering elements in `feat_fp16` are between 0 and 1, they will be ignored due to the round-off...