shreddit icon indicating copy to clipboard operation
shreddit copied to clipboard

Error with numeric subreddits in GDPR mode

Open timmc opened this issue 1 year ago • 1 comments

shreddit reliably chokes on parsing comments.csv when the subreddit field is all-numeric, e.g. for /r/404 or /r/2012. For example, if I change ...,technology,... to ...,2012,... in the first record of my comments.csv in the GDPR export, I get the following error:

  2023-07-08T21:34:11.153874Z  INFO  Shredding Comments...
    at src/main.rs:52

thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: Error(Deserialize { pos: Some(Position { byte: 63, line: 1, record: 1 }), err: DeserializeError { field: None, kind: Message("data did not match any variant of untagged enum Source") } })', src/sources/gdpr.rs:17:41
stack backtrace:
   0: rust_begin_unwind
             at /rustc/90c541806f23a127002de5b4038be731ba1458ca/library/std/src/panicking.rs:578:5
   1: core::panicking::panic_fmt
             at /rustc/90c541806f23a127002de5b4038be731ba1458ca/library/core/src/panicking.rs:67:14
   2: core::result::unwrap_failed
             at /rustc/90c541806f23a127002de5b4038be731ba1458ca/library/core/src/result.rs:1687:5
   3: core::result::Result<T,E>::unwrap
             at /rustc/90c541806f23a127002de5b4038be731ba1458ca/library/core/src/result.rs:1089:23
   4: shreddit::sources::gdpr::list::{{closure}}
             at ./src/sources/gdpr.rs:17:39
   5: core::ops::function::impls::<impl core::ops::function::FnOnce<A> for &mut F>::call_once
             at /rustc/90c541806f23a127002de5b4038be731ba1458ca/library/core/src/ops/function.rs:310:13
...

Maybe the csv Reader is producing a numeric type when there's an all-numeric sequence, and then deserialize fails because the types don't match?

My current workaround is to remove the subreddit: String field from the Gdpr enum variant in comment.rs, as it's not currently used for anything. Removing the offending lines from the .csv should also work.

timmc avatar Jul 08 '23 21:07 timmc