HUANG Fei
HUANG Fei
@yzy5630 To my understanding, Lightseq does not support learnable positional embeddings in the current version. It may cause the differences.
In the latest version of fairseq (I'm using https://github.com/pytorch/fairseq/tree/420136acd2a57de22e62f13930aa23e086bcbbf8), ``args.device_id`` is not correctly set, so all lightseq module will allocate the memory on device 0. Notice the ``local_rank`` below: https://github.com/bytedance/lightseq/blob/812d9d798e491ab9139c1f36113693308c4c0637/lightseq/training/cli/fs_modules/ls_transformer.py#L148-L160...
This project is not a pip module. You need to copy the file, i.e. ``gpu_mem_track.py``, to your working directory
Total Used Memory is the peak of the memory usage. When you delete some tensors, PyTorch will not release the space to the device, until you call torch.cuda.empty_cache() like the...
Hello. I just answer the question in my PR. It is because the cuda kernel take some space. If you are interested, you can see the revised code here: https://github.com/hzhwcmhf/Pytorch-Memory-Utils/blob/master/README.md#faqs...
@SCAUapc We use pytorch API to obtain the memory use. You can see the explanation of ``torch.cuda.memory_allocated`` [here](https://pytorch.org/docs/stable/generated/torch.cuda.memory_allocated.html?highlight=memory_allocated#torch.cuda.memory_allocated) > This is likely less than the amount shown in nvidia-smi since...
我们去年已经有同学整理过笔记 https://github.com/thunlp/OOP-THU/issues/47 如果只是重复课上内容可能贡献不大,大家可能也不太想看到重复内容。如果有新增部分,欢迎对之前的笔记做一定的补充。
@EGalahad 第一个问题:是的,#ifdef和#pragma once都是宏,所以只在编译期有效 第二个问题:``.h``文件提供了使用函数的声明,如果不在``main.cpp``里include ``func1.h``,需要手动声明函数,否则编译器无法知道函数调用所对应的的类型。
建议在开头引用 https://github.com/thu-coai/THUOOP/issues/11 最好先科普或者链接下,range-based for需要什么(据我所知只需要begin和end) 后面一些额外的using希望能给一些例子,让大家知道这些东西有什么用
你说的对,确实需要三个重载。 关于例子,能再详细一点,给个实际的代码展示这部分定义的区别呢? 我看好像stl里有std::advance这个函数,对于随机迭代器是O(1),对于其他是线性的。这个具体实现是因为这些属性变化的吗?