shreddit
shreddit copied to clipboard
Error with numeric subreddits in GDPR mode
shreddit reliably chokes on parsing comments.csv when the subreddit field is all-numeric, e.g. for /r/404 or /r/2012. For example, if I change ...,technology,...
to ...,2012,...
in the first record of my comments.csv in the GDPR export, I get the following error:
2023-07-08T21:34:11.153874Z INFO Shredding Comments...
at src/main.rs:52
thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: Error(Deserialize { pos: Some(Position { byte: 63, line: 1, record: 1 }), err: DeserializeError { field: None, kind: Message("data did not match any variant of untagged enum Source") } })', src/sources/gdpr.rs:17:41
stack backtrace:
0: rust_begin_unwind
at /rustc/90c541806f23a127002de5b4038be731ba1458ca/library/std/src/panicking.rs:578:5
1: core::panicking::panic_fmt
at /rustc/90c541806f23a127002de5b4038be731ba1458ca/library/core/src/panicking.rs:67:14
2: core::result::unwrap_failed
at /rustc/90c541806f23a127002de5b4038be731ba1458ca/library/core/src/result.rs:1687:5
3: core::result::Result<T,E>::unwrap
at /rustc/90c541806f23a127002de5b4038be731ba1458ca/library/core/src/result.rs:1089:23
4: shreddit::sources::gdpr::list::{{closure}}
at ./src/sources/gdpr.rs:17:39
5: core::ops::function::impls::<impl core::ops::function::FnOnce<A> for &mut F>::call_once
at /rustc/90c541806f23a127002de5b4038be731ba1458ca/library/core/src/ops/function.rs:310:13
...
Maybe the csv Reader is producing a numeric type when there's an all-numeric sequence, and then deserialize fails because the types don't match?
My current workaround is to remove the subreddit: String
field from the Gdpr enum variant in comment.rs, as it's not currently used for anything. Removing the offending lines from the .csv should also work.