OverflowError: cannot fit 'int' into an index-sized integer
使用运行命令:
visualdl --logdir output/combine_all_0411131554_paddle/ --host 0.0.0.0
运行visualdl后,报错如下:
VisualDL 2.2.3
Traceback (most recent call last):
File "/home/hechanghong/miniconda3/envs/paddle2.1/bin/visualdl", line 8, in <module>
sys.exit(main())
File "/home/hechanghong/miniconda3/envs/paddle2.1/lib/python3.8/site-packages/visualdl/server/app.py", line 177, in main
_run(args)
File "/home/hechanghong/miniconda3/envs/paddle2.1/lib/python3.8/site-packages/visualdl/server/app.py", line 156, in _run
app = create_app(args)
File "/home/hechanghong/miniconda3/envs/paddle2.1/lib/python3.8/site-packages/visualdl/server/app.py", line 65, in create_app
api_call = create_api_call(args.logdir, args.model, args.cache_timeout)
File "/home/hechanghong/miniconda3/envs/paddle2.1/lib/python3.8/site-packages/visualdl/server/api.py", line 250, in create_api_call
api = Api(logdir, model, cache_timeout)
File "/home/hechanghong/miniconda3/envs/paddle2.1/lib/python3.8/site-packages/visualdl/server/api.py", line 65, in __init__
self._reader = LogReader(logdir)
File "/home/hechanghong/miniconda3/envs/paddle2.1/lib/python3.8/site-packages/visualdl/reader/reader.py", line 89, in __init__
self.load_new_data(update=True)
File "/home/hechanghong/miniconda3/envs/paddle2.1/lib/python3.8/site-packages/visualdl/reader/reader.py", line 354, in load_new_data
self.add_remain()
File "/home/hechanghong/miniconda3/envs/paddle2.1/lib/python3.8/site-packages/visualdl/reader/reader.py", line 294, in add_remain
remain = self.reader.get_remain()
File "/home/hechanghong/miniconda3/envs/paddle2.1/lib/python3.8/site-packages/visualdl/reader/record_reader.py", line 106, in get_remain
for item in self._reader:
File "/home/hechanghong/miniconda3/envs/paddle2.1/lib/python3.8/site-packages/visualdl/reader/record_reader.py", line 60, in __next__
self._reader.get_next()
File "/home/hechanghong/miniconda3/envs/paddle2.1/lib/python3.8/site-packages/visualdl/reader/record_reader.py", line 40, in get_next
event_str = self.file_handle.read(header_len)
File "/home/hechanghong/miniconda3/envs/paddle2.1/lib/python3.8/site-packages/visualdl/io/bfile.py", line 592, in read
self.buff, self.continuation_token = self.fs.read(
File "/home/hechanghong/miniconda3/envs/paddle2.1/lib/python3.8/site-packages/visualdl/io/bfile.py", line 121, in read
data = fp.read(size)
OverflowError: cannot fit 'int' into an index-sized integer
看样子是日志里的数据出问题了,能不能给我们发一份日志文件进行调试一下
看样子是日志里的数据出问题了,能不能给我们发一份日志文件进行调试一下 可以的,有一个8.4Mb的日志文件,请问怎么发给你们呢?
这是我解析日志里面每一条数据的字节长度,报错的时候的位置如上所示,有一条数据的字节长度是10734638070951275615,在这之上还有几条数据长度为0的。我猜是从这里开始写入的东西开始出现问题,不知道你记录的是什么数据呢。估计是数据长度为0的这里就开始写入混乱了,在解析的时候才会将不是表明数据长度的字节解析为了数据长度,10734638070951275615这个数是8个字节unsigned类型才能表示,8个字节的signed类型表示不了这个值,可能因此才报了Overflow的错误吧
但是我只调用了writer.add_scalar(f'{k}_eval_loss', loss_dict[k], global_step[k]) 这一个数据记录API,不存在多进程写入冲突,loss也同步使用日志打印是没什么问题的,中途突然出错,会不会是visualdl的缓存bug之类的原因呢?
请问global_step[k]存的值是什么
global_step的所有操作如下,应该是没有问题的
global_step = defaultdict(int)
for epoch in range(num_epoches):
for task, data in dataloader:
writer.add_scalar(f'{task}_train_loss', loss.item(), global_step[task])
global_step[task] += 1
看起来是挺正常的,dataloader是你们自己写的dataloader是么,这个task是任务的名称。 这个问题是百分百能够复现的么, print(f'{task}_train_loss', loss.item(), global_step[task]) 这个东西到文件会出现异常么。
可以尝试在writer.add_scalar上面加一行,print,然后程序跑的时候重定向标准输出到一个文本文件中去,如果报错的时候就知道是哪一行没能在LogWriter里面写成功。如果通过这种方法找到了问题的原因,还请告知一下我们写哪一句时候有问题
是paddle.io.Dataloader。项目最近的代码版本确实几乎都能复现,感谢你的建议,后续重跑实验后再来反馈