[BUG] swanlab sync error
🐛 Bug description [Please make everyone to understand it]
When I'm trying to sync an offline log, it may raise error: swanlab/data/porter/datastore.py", line 126, in scan assert pad == pad_check, "invalid padding" ^^^^^^^^^^^^^^^^ AssertionError: invalid padding
This error occurs randomly, while at most of the time, it works.
🧑💻 Step to reproduce
swanlab sync ~/swanlog/xxx/
👾 Expected result
Write down the results you expect
🚑 Any additional [like screenshots]
- SwanLab Version: swanboard 0.1.8b1 swankit 0.2.4 swanlab 0.6.8
- Platform: Ubuntu 20.04
Did you perform a sync operation during the experiment run?
Yes, but this always works. Is this action causing the error?
Yes, but this always works. Is this action causing the error?
Yes, but we allow this operation, so it's my issue, and I'll fix it soon.
Thanks a lot, there's also another feature that impacts the experience: when I sync the same experiment twice, it will generate two experiment-IDs on the web, which is better in one😊
Thanks a lot, there's also another feature that impacts the experience: when I sync the same experiment twice, it will generate two experiment-IDs on the web, which is better in one😊
Indeed, this issue arose due to a flaw in our initial design, which we plan to address in the future. Perhaps you could open a new issue for us to track this problem?
Sure, I also wonder is there a quick fix way to sync the broken experiment details, cause a training session takes a lot of time.
Sure, I also wonder is there a quick fix way to sync the broken experiment details, cause a training session takes a lot of time.
If you use sync with --id, you can resume the training session
docs: https://docs.swanlab.cn/api/cli-swanlab-sync.html#swanlab-sync
i tried using swanlab sync ./swanlog/run-xxx --id
But still raise:
python3.11/site-packages/swanlab/data/porter/datastore.py", line 126, in scan
assert pad == pad_check, "invalid padding"
^^^^^^^^^^^^^^^^
AssertionError: invalid padding
should i upgrade swanlab or sth else?
But still raise:
I tried to resolve it in #199, here's a whl package for you:
swanlab-0.6.11b0-py3-none-any.whl.zip
You can try extracting it and then install using the following command:
pip install swanlab-0.6.11b0-py3-none-any.whl
Perhaps the error will no longer appear?
is it work for the already broken log, or for the new log?
for the before log it still not works: File "/lib/python3.11/site-packages/swanlab/sync/init.py", line 65, in sync proj, exp = porter.parse() ^^^^^^^^^^^^^^ File "lib/python3.11/site-packages/swanlab/data/porter/init.py", line 54, in wrapper return wrapped(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^ File "lib/python3.11/site-packages/swanlab/data/porter/init.py", line 403, in parse for record in self._f: File "lib/python3.11/site-packages/swanlab/data/porter/datastore.py", line 157, in next record = self.scan() ^^^^^^^^^^^ File "lib/python3.11/site-packages/swanlab/data/porter/datastore.py", line 126, in scan assert pad == pad_check, "invalid padding" ^^^^^^^^^^^^^^^^ AssertionError: invalid padding $ pip list | grep swanlab swanlab 0.6.11b0
Sorry, I got the wrong version, it's actually this one:
swanlab-0.6.11b1-py3-none-any.whl.zip
It works for broken log. Have a try!
Well, it seems sync complete while i have trained for 500steps, the log ends at 210step, why is the left log disappears? i use verl for training, so i think this is not the RL framework's fault?
Well, it seems sync complete while i have trained for 500steps, the log ends at 210step, why is the left log disappears? i use verl for training, so i think this is not the RL framework's fault?
The issue likely lies in the line assert pad == pad_check, "invalid padding". In fact, there shouldn't be any problem with this line. Could you package the problematic log files and send them to my email at [email protected]? Perhaps I can debug and identify the issue.
Logging a related issue
Sorry, I got the wrong version, it's actually this one:
swanlab-0.6.11b1-py3-none-any.whl.zip
It works for broken log. Have a try!
-
Problem: after the training process was killed unexpectedly (e.g., due to a server crash), the log file was not properly closed, leading to incomplete data. This causes the swanlab sync command to fail when trying to upload the experiment.
-
The fix: The upload worked after I installed and used a specific test version: swanlab-0.6.11b1-py3-none-any.whl.zip
-
SwanLab Version: swanboard 0.1.8b1 swankit 0.2.4 swanlab 0.6.8
-
Platform: Ubuntu 20.04
Special thanks to @SAKURA-CAT for the prompt response and support in identifying this solution.