neavo comments

Results 89 comments of


                                            neavo

During pre-training, using FA2 consumes more memory than using SDPA

Still unable to identify the root cause of the issue, but a "silly" solution has been found. I observed that when FA2 is enabled and causes abnormal GPU memory usage,...

During pre-training, using FA2 consumes more memory than using SDPA

> Hello, Thanks for the thorough investigation! Clearing memory should not be required when training, so I think there is indeed a need for a cleaner fix. IIRC, I saw...

During pre-training, using FA2 consumes more memory than using SDPA

Hello everyone, is there anything to update? I eventually completed the training using SDPA [modern_bert_multilingual](https://huggingface.co/neavo/modern_bert_multilingual) but it would be even better if the memory issue on FA2 could be resolved.

[Feature Request]制作术语表时去除简单的汉字替换

很早期的版本是直接剔除纯汉字词的因为那个时候模型都挺弱智的，术语表基本上只起到一个固定译名的作用但现在模型的能力要强多了，除了固定译名以外，术语表还起到提供辅助信息辅助翻译的作用比如最典型的场景：提供性别信息来协助模型确定人称代词在这种情况下，即使是纯汉字词保留在术语表中也是有意义的，这是目前的思路

术语表搜索应支持大小写敏感开关

这个可以有

术语表搜索应支持大小写敏感开关

已完成 [MANUAL_BUILD_v0.28.2](https://github.com/neavo/LinguaGacha/releases/tag/MANUAL_BUILD_v0.28.2)

[Feature Request] 在提示词中增加正确提取实体的示例，以提高格式正确率以及实体识别的准确率

> 这项目的管线看起来挺像RAG的，可以在提示词里面也加入GraphRAG一样的示例： https://github.com/microsoft/graphrag/blob/main/graphrag/prompts/index/entity_extraction.py > > 另外，有啥好用的本地LLM推荐的吗？能同时支持中日双语又足够“聪明”的开源模型可不好找。细分任务和流程确实可以提升最终效果但是同时也会大幅度增加时间与 Token 的消耗，也对模型的能力提出了更高的要求，所以是需要权衡的项目最近的改进方向正好相反即在保证效果的前提下，尽可能合并任务，来减少消耗，同时保证本地小模型也能有一个尚可的效果开发用的就是一键包里面的 Qwen2.5-7B

neavo

During pre-training, using FA2 consumes more memory than using SDPA

During pre-training, using FA2 consumes more memory than using SDPA

During pre-training, using FA2 consumes more memory than using SDPA

[Feature Request]制作术语表时去除简单的汉字替换

术语表搜索应支持大小写敏感开关

术语表搜索应支持大小写敏感开关

[Feature Request] 在提示词中增加正确提取实体的示例，以提高格式正确率以及实体识别的准确率

功能建议：不二次翻译情况下导入t++工程里

[Feature Request] Shutdown Timer

能否考虑支持在MAC上运行呢？