glowwormX
glowwormX
sdk1.0.3 1、TxService.getTransactionsCountByContractAddr返回的json不应该用TxResponse序列化 ``` @Override public Request getTransactionsCountByContractAddr(String from, String to, String contractAddress, boolean txExtra, int... nodeIds) { TxRequest txRequest = new TxRequest(TX_PREFIX + "getTransactionsCountByContractAddr", providerManager, TxResponse.class, nodeIds); HashMap params =...
demo每次调用合约需要新写一个类,尝试lambda表达式写法,`org.apache.bcel.util`报找不到类错误 ``` //调用 注册 //Transaction transaction1 = new Transaction.HVMBuilder(account.getAddress()).invoke(contractAddress, new InvokeStudentReg()).build(); //使用lambda表达式,避免新写一个类 BaseInvoke register = iStudent -> iStudent.registerStudent(Arrays.asList(new Student("id1", "name1", 20), new Student("id2", "name2", 20))); Transaction transaction1 = new Transaction.HVMBuilder(account.getAddress()).invoke(contractAddress,...
### Reminder - [x] I have read the above rules and searched the existing issues. ### System Info main ### Reproduction 我看了Changelog中说支持了deepseek v3和r1,但我看commit中只提交了template,没有其他实现,请问有跑通的example吗。 我的理解是跑通需要支持专家并行,DeepSeek-V3 bf16 1.3T,用zero3要把每个专家所有参数都allgather,通信量太大。另外[huggingface的modeling_deepseek](https://huggingface.co/deepseek-ai/DeepSeek-V3/blob/main/modeling_deepseek.py) 不支持训练 439:assert not self.training,也需要另外的实现...
I trained on the NPU using FP16, and found many NaN values in step 1 of the training results. ``` (TaskRunner pid=1218449) [2025-11-19 17:41:06,020] [INFO] [aggregate_logger.py:54:log]: step:1 actor/entropy:0.8346855640411377 training/rollout_probs_diff_valid:1 training/rollout_probs_diff_max:nan...
我使用verl 1030的main代码,按 [Dockerfile.ascend_8.2.rc1_a2](https://github.com/volcengine/verl/blob/main/docker/Dockerfile.ascend_8.2.rc1_a2)安装环境 跑recipe/dapo/run_dapo_qwen3_moe_30b_megatron_npu.sh,初始化、推理均已完成,但训练时报错: ``` ray.exceptions.RayTaskError(RuntimeError): [36mray::WorkerDict.actor_rollout_update_actor()[39m (pid=490071, ip=172.16.2.11, actor_id=dee0d43a6f32372ec4ff655e04000000, repr=) File "/cache/ray_temp/session_2025-10-31_17-44-09_992100_1145212/runtime_resources/working_dir_files/_ray_pkg_d85728c4d7bda8f2/verl/single_controller/ray/base.py", line 700, in func return getattr(self.worker_dict[key], name)(*args, **kwargs) File "/cache/ray_temp/session_2025-10-31_17-44-09_992100_1145212/runtime_resources/working_dir_files/_ray_pkg_d85728c4d7bda8f2/verl/single_controller/base/decorator.py", line 442, in inner return func(*args, **kwargs)...
运行30b 开启: ``` actor_rollout_ref.rollout.tensor_model_parallel_size=2 \ actor_rollout_ref.rollout.data_parallel_size=2 \ actor_rollout_ref.rollout.expert_parallel_size=4 \ ``` a2上我希望运行235b tp=8 dp=2,或者其他性能更高的切分策略,先在30b上测试结果报错了; 而去除dp ep正常运行,使用fsdp、megatron均报错 verl11.18代码 51d2104ecb61563c41123a8f0bce2f06b18387dc vllm 0.11.0.rc2 cann8.3.rc2 日志: ``` [36m(WorkerDict pid=3381096)[0m INFO 12-04 17:33:17 [layer.py:332] FlashInfer CUTLASS...