materialize
materialize copied to clipboard
Load generator source crashed and failed to restart
What version of Materialize are you using?
v0.27.0-alpha.8
How did you install Materialize?
Materialize Cloud
What is the issue?
I created an auction load generator source:
CREATE SOURCE lg FROM LOAD GENERATOR auction;
I let it run for about 30m. It crashed. I don’t have the logs from the first crash, unfortunately. Kubernetes doesn’t save them and I can’t find them in Honeycomb. Upon restart the storaged
process entered a crash loop:
2022-07-30T06:43:34.749299Z INFO storaged: about to bind to 0.0.0.0:2100
2022-07-30T06:43:34.749324Z INFO storaged: serving internal HTTP server on 0.0.0.0:6878
2022-07-30T06:43:34.752799Z INFO mz_service::grpc: Starting to listen on 0.0.0.0:2100
2022-07-30T06:43:56.166688Z INFO mz_service::grpc: GrpcServer: remote client connected
2022-07-30T06:43:56.186778Z INFO send_operation{operation="GetObject" service="s3"}:provide_credentials{provider=default_chain}: aws_config::meta::credentials::chain: provider in chain did not provide credentials provider=Environment context=environment variable not set
2022-07-30T06:43:56.186824Z INFO send_operation{operation="GetObject" service="s3"}:provide_credentials{provider=default_chain}: aws_config::meta::credentials::chain: provider in chain did not provide credentials provider=Profile context=No profiles were defined
2022-07-30T06:43:56.187093Z INFO send_operation{operation="GetObject" service="s3"}:provide_credentials{provider=default_chain}:send_operation{operation="AssumeRoleWithWebIdentity" service="sts"}: aws_http::auth: provider returned CredentialsNotLoaded, ignoring
2022-07-30T06:43:56.413516Z INFO send_operation{operation="GetObject" service="s3"}:provide_credentials{provider=default_chain}: aws_config::meta::credentials::chain: loaded credentials provider=WebIdentityToken
2022-07-30T06:43:56.811398Z INFO tokio_postgres::connection: NOTICE: relation "consensus" already exists, skipping
thread 'timely:work-0' panicked at 'invalid as_of: as_of(Antichain { elements: [0] }) < since(Antichain { elements: [1659163020001] })', src/storage/src/source/reclock.rs:92:9
stack backtrace:
0: rust_begin_unwind
at ./rustc/a8314ef7d0ec7b75c336af2c9857bfaf43002bfc/library/std/src/panicking.rs:584:5
1: core::panicking::panic_fmt
at ./rustc/a8314ef7d0ec7b75c336af2c9857bfaf43002bfc/library/core/src/panicking.rs:142:14
2: mz_storage::source::reclock::ReclockOperator::new::{{closure}}
at ./var/lib/buildkite-agent/builds/buildkite-builders-d43b1b5-i-0dc80afdafce99961-1/materialize/tests/src/storage/src/source/reclock.rs:92:9
3: <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll
at ./rustc/a8314ef7d0ec7b75c336af2c9857bfaf43002bfc/library/core/src/future/mod.rs:91:19
4: mz_storage::source::create_raw_source::{{closure}}::{{closure}}
at ./var/lib/buildkite-agent/builds/buildkite-builders-d43b1b5-i-0dc80afdafce99961-1/materialize/tests/src/storage/src/source/mod.rs:742:41
5: <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll
at ./rustc/a8314ef7d0ec7b75c336af2c9857bfaf43002bfc/library/core/src/future/mod.rs:91:19
6: <async_stream::async_stream::AsyncStream<T,U> as futures_core::stream::Stream>::poll_next
at ./cargo/registry/src/github.com-1ecc6299db9ec823/async-stream-0.3.3/src/async_stream.rs:53:17
7: mz_storage::source::create_raw_source::{{closure}}::{{closure}}
at ./var/lib/buildkite-agent/builds/buildkite-builders-d43b1b5-i-0dc80afdafce99961-1/materialize/tests/src/storage/src/source/mod.rs:890:50
8: mz_storage::source::util::source::{{closure}}::{{closure}}
at ./var/lib/buildkite-agent/builds/buildkite-builders-d43b1b5-i-0dc80afdafce99961-1/materialize/tests/src/storage/src/source/util.rs:103:17
9: timely::dataflow::operators::generic::builder_rc::OperatorBuilder<G>::build::{{closure}}::{{closure}}
at ./cargo/git/checkouts/timely-dataflow-4c0cc365061cd263/ee65cdc/timely/src/dataflow/operators/generic/builder_rc.rs:123:31
10: timely::dataflow::operators::generic::builder_rc::OperatorBuilder<G>::build_reschedule::{{closure}}
at ./cargo/git/checkouts/timely-dataflow-4c0cc365061cd263/ee65cdc/timely/src/dataflow/operators/generic/builder_rc.rs:162:26
11: <timely::dataflow::operators::generic::builder_raw::OperatorCore<T,L> as timely::scheduling::Schedule>::schedule
at ./cargo/git/checkouts/timely-dataflow-4c0cc365061cd263/ee65cdc/timely/src/dataflow/operators/generic/builder_raw.rs:204:9
12: timely::progress::subgraph::PerOperatorState<T>::schedule
at ./cargo/git/checkouts/timely-dataflow-4c0cc365061cd263/ee65cdc/timely/src/progress/subgraph.rs:684:30
13: timely::progress::subgraph::Subgraph<TOuter,TInner>::activate_child
at ./cargo/git/checkouts/timely-dataflow-4c0cc365061cd263/ee65cdc/timely/src/progress/subgraph.rs:344:26
14: <timely::progress::subgraph::Subgraph<TOuter,TInner> as timely::scheduling::Schedule>::schedule
at ./cargo/git/checkouts/timely-dataflow-4c0cc365061cd263/ee65cdc/timely/src/progress/subgraph.rs:310:17
15: timely::progress::subgraph::PerOperatorState<T>::schedule
at ./cargo/git/checkouts/timely-dataflow-4c0cc365061cd263/ee65cdc/timely/src/progress/subgraph.rs:684:30
16: timely::progress::subgraph::Subgraph<TOuter,TInner>::activate_child
at ./cargo/git/checkouts/timely-dataflow-4c0cc365061cd263/ee65cdc/timely/src/progress/subgraph.rs:344:26
17: <timely::progress::subgraph::Subgraph<TOuter,TInner> as timely::scheduling::Schedule>::schedule
at ./cargo/git/checkouts/timely-dataflow-4c0cc365061cd263/ee65cdc/timely/src/progress/subgraph.rs:310:17
18: timely::worker::Wrapper::step::{{closure}}
at ./cargo/git/checkouts/timely-dataflow-4c0cc365061cd263/ee65cdc/timely/src/worker.rs:754:57
19: core::option::Option<T>::map
at ./rustc/a8314ef7d0ec7b75c336af2c9857bfaf43002bfc/library/core/src/option.rs:909:29
20: timely::worker::Wrapper::step
at ./cargo/git/checkouts/timely-dataflow-4c0cc365061cd263/ee65cdc/timely/src/worker.rs:754:26
21: timely::worker::Worker<A>::step_or_park
at ./cargo/git/checkouts/timely-dataflow-4c0cc365061cd263/ee65cdc/timely/src/worker.rs:396:38
22: mz_storage::storage_state::Worker<A>::run_client
at ./var/lib/buildkite-agent/builds/buildkite-builders-d43b1b5-i-0dc80afdafce99961-1/materialize/tests/src/storage/src/storage_state.rs:109:13
23: mz_storage::storage_state::Worker<A>::run
at ./var/lib/buildkite-agent/builds/buildkite-builders-d43b1b5-i-0dc80afdafce99961-1/materialize/tests/src/storage/src/storage_state.rs:93:33
24: mz_storage::protocol::server::serve::{{closure}}
at ./var/lib/buildkite-agent/builds/buildkite-builders-d43b1b5-i-0dc80afdafce99961-1/materialize/tests/src/storage/src/protocol/server.rs:87:9
25: timely::execute::execute::{{closure}}
at ./cargo/git/checkouts/timely-dataflow-4c0cc365061cd263/ee65cdc/timely/src/execute.rs:287:22
26: timely_communication::initialize::initialize_from::{{closure}}
at ./cargo/git/checkouts/timely-dataflow-4c0cc365061cd263/ee65cdc/communication/src/initialize.rs:316:33
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.
Stream closed EOF for environment-4777bfc1-2bd8-4a13-a5dd-a44928652335-0/storage-u1-0 (default)
I suspect that load generator sources do not restart correctly. Hopefully this readily repros locally.
cc @mjibson
Relevant log output
No response
This doesn't repro locally with the process orchestrator. I then started a local kind cluster and started it, then ran copy tail in a kubectl exec
psql terminal. In k9s I killed the environmentd
pod which disconnected, but on reconnect the tail was fine. Then I killed the storaged pod. This caused the tail output to stop producing new rows for some seconds, then the rows resumed exactly as expected.
Unless we have a better way to repro this I'm not sure there's much else I can do. Is it possible this wasn't a load generator bug, but a as_of/since problem? We've had some changes there recently.
Let me give a repro a shot. Cockroach started applying a maintenance update at the exact moment this crashed, so I'm going to try restarting PostgreSQL locally and seeing if that triggers the same crash.
Ok, here's something interesting:
$ bin/environmentd --reset
$ psql -h localhost -p 6875 -U materialize materialize
> CREATE SOURCE lg FROM LOAD GENERATOR auction
> SELECT * FROM lg
# Wait a minute.
$ killall storaged
$ psql -h localhost -p 6875 -U materialize materialize
> SELECT * FROM lg
# This will hang for however long it takes storaged to catch back up. If storaged was up for a minute before it was killed, seems to take a minute for it to catch back up.
I'm going to try running these steps again but waiting 10m to kill storaged. That might expose the crash loop I was seeing in cloud.
I did observe the waiting for same time thing. I had assumed it was just storaged startup time, but it did feel longer than I expected. https://github.com/MaterializeInc/materialize/blob/9850d790a6e9f694bc6c04097eef1bc65a1f4752/src/storage/src/source/generator.rs#L84 is supposed to handle this, but it assumes the offset is set to the number of Finalized
messages it sent back in all previous executions. Is that not the case?
Is that not the case?
I don't know! But I assume not. I just did an experiment where I waited 18m between starting and killing storaged
. Didn't cause the crash I saw in cloud, but my SELECT
against the source is now quite hung. EXPLAIN TIMESTAMP
shows an 18m skew between the current time and the source upper.
I'll look into this.
Adding some debugs:
dbg!(&offset);
let start = Instant::now();
for _ in 0..offset.offset {
dbg!(rows.next());
}
dbg!(start.elapsed());
After letting the counter load gen run into the 400s at a 1 second interval prints:
[src/storage/src/source/generator.rs:84] &offset = MzOffset {
offset: 457,
}
...
[src/storage/src/source/generator.rs:87] rows.next() = Some(
Row{[
Int64(
456,
),
]},
)
[src/storage/src/source/generator.rs:87] rows.next() = Some(
Row{[
Int64(
457,
),
]},
)
[src/storage/src/source/generator.rs:89] start.elapsed() = 37.418ms
And running a COPY TAIL of that only takes a few seconds after starting up to start producing tail results again. This is not obviously the generator's problem. Doing this a few times, though, sometimes was quite slow after startup. I noticed this in the logs:
[1mcompute-cluster-1-replica-1:[0m [2m2022-08-02T20:33:14.636149Z[0m [32m INFO[0m [2mmz_persist_client::r#impl::machine[0m[2m:[0m snapshot s20d52d65-76ba-4ff8-a719-5fd042a18949 as of Antichain { elements: [1659472403778] } not yet available for upper Antichain { elements: [1659472394001] } retrying in 64ms
[1mcompute-cluster-1-replica-1:[0m [2m2022-08-02T20:33:14.708200Z[0m [32m INFO[0m [2mmz_persist_client::r#impl::machine[0m[2m:[0m snapshot s20d52d65-76ba-4ff8-a719-5fd042a18949 as of Antichain { elements: [1659472403778] } not yet available for upper Antichain { elements: [1659472394001] } retrying in 128ms
[1mcompute-cluster-1-replica-1:[0m [2m2022-08-02T20:33:14.848384Z[0m [32m INFO[0m [2mmz_persist_client::r#impl::machine[0m[2m:[0m snapshot s20d52d65-76ba-4ff8-a719-5fd042a18949 as of Antichain { elements: [1659472403778] } not yet available for upper Antichain { elements: [1659472395001] } retrying in 256ms
[1mcompute-cluster-1-replica-1:[0m [2m2022-08-02T20:33:15.129814Z[0m [32m INFO[0m [2mmz_persist_client::r#impl::machine[0m[2m:[0m snapshot s20d52d65-76ba-4ff8-a719-5fd042a18949 as of Antichain { elements: [1659472403778] } not yet available for upper Antichain { elements: [1659472395001] } retrying in 512ms
[1mcompute-cluster-1-replica-1:[0m [2m2022-08-02T20:33:15.626943Z[0m [32m INFO[0m [2mmz_persist_client::r#impl::machine[0m[2m:[0m snapshot s20d52d65-76ba-4ff8-a719-5fd042a18949 as of Antichain { elements: [1659472403778] } not yet available for upper Antichain { elements: [1659472395001] } retrying in 1.024s
[1mcompute-cluster-1-replica-1:[0m [2m2022-08-02T20:33:16.735004Z[0m [32m INFO[0m [2mmz_persist_client::r#impl::machine[0m[2m:[0m snapshot s20d52d65-76ba-4ff8-a719-5fd042a18949 as of Antichain { elements: [1659472403778] } not yet available for upper Antichain { elements: [1659472396001] } retrying in 2.048s
[1mcompute-cluster-1-replica-1:[0m [2m2022-08-02T20:33:18.608365Z[0m [32m INFO[0m [2mmz_persist_client::r#impl::machine[0m[2m:[0m snapshot s20d52d65-76ba-4ff8-a719-5fd042a18949 as of Antichain { elements: [1659472403778] } not yet available for upper Antichain { elements: [1659472398001] } retrying in 4.096s
[1mcompute-cluster-1-replica-1:[0m [2m2022-08-02T20:33:22.956492Z[0m [32m INFO[0m [2mmz_persist_client::r#impl::machine[0m[2m:[0m snapshot s20d52d65-76ba-4ff8-a719-5fd042a18949 as of Antichain { elements: [1659472403778] } not yet available for upper Antichain { elements: [1659472403001] } retrying in 8.192s
Seems that computed is doing an exponential retry backoff, which I do think is responsible for this.
Seems that computed is doing an exponential retry backoff, which I do think is responsible for this.
@mjibson would you like the compute team to take a look at this? (cc @chaas)
Restarting cockroach after creating the source spits out a bunch of [1menvironmentd:[0m [2m2022-08-08T23:36:37.849813Z[0m [33m WARN[0m [2mmz_storage::controller::persist_write_handles[0m[2m:[0m Retrying indeterminate table write: Indeterminate { inner: error connecting to server: Connection refused (os error 111)
, but then resumes normally. It's possible this got fixed or is triggered in some other way. I'm not sure if the compute backoff is a real bug or not. Someone from that team can decide!
I made a auction source, then ran copy (tail demo) to stdout
in a terminal and got some rows. Then pkill storaged
, which paused the tail and I see in the logs that it restarts after some seconds, tail stream resumes as expected. Then pkill computed
and same thing: tail pauses, computed restarts, tail resumes as normal. Am currently out of ideas for how to repro this locally.
One thing we haven't tried (I think) is killing Cockroach, waiting a few minutes, and then turning it back on. Maybe something goes wrong like that.
Killed cockroach for 4 minutes then restarted it with a 50ms tick. The tail resumed a few seconds later, didn't see any crashes in the console. One thing that did happen though was the tail had queued all of the data it had generated over those minutes. But even with a 50ms interval that should not be enough to oom anything that is queuing it.
Removed the triage label as this is on the project backlog and discussions are happening.
I want to re-raise this as I just ran into this bug following the "getting started guide" and I think many (all, even?) people will run into this @uce .
I can also confirm it still happens with v0.28.0
. The error message looks slightly different now, but I'm quite sure it's the same bug:
> kubectl logs --context mzcloud-staging-eu-west-1-0 -n environment-da15e202-1c10-4bf8-8a84-a6994358457d-0 storage-u6-0 --previous 1
2022-10-19T09:17:51.591939Z INFO storaged: about to bind to 0.0.0.0:2100
2022-10-19T09:17:51.591984Z INFO storaged: serving internal HTTP server on 0.0.0.0:6878
2022-10-19T09:17:51.597586Z INFO mz_service::grpc: Starting to listen on 0.0.0.0:2100
2022-10-19T09:18:04.958248Z INFO mz_service::grpc: GrpcServer: remote client connected
2022-10-19T09:18:05.725990Z INFO provide_credentials{provider=default_chain}: aws_http::auth: provider returned CredentialsNotLoaded, ignoring
2022-10-19T09:18:06.287845Z INFO tokio_postgres::connection: NOTICE: relation "consensus" already exists, skipping
2022-10-19T09:18:07.712344Z INFO mz_persist_client::internal::maintenance: Force expiring reader (r485af5b0-4531-416a-b005-9eca1b2a9fc9) of shard (sd7a4d70f-644b-4160-8a52-fc37a63f5def) due to inactivity
2022-10-19T09:18:07.712455Z INFO mz_persist_client::internal::maintenance: Force expiring reader (r536ad54b-0d29-4b98-8766-d61efb0b0190) of shard (sd7a4d70f-644b-4160-8a52-fc37a63f5def) due to inactivity
2022-10-19T09:18:07.712532Z INFO mz_persist_client::internal::maintenance: Force expiring writer (w9203d424-52b1-451e-b965-0ed9e1b83112) of shard (sd7a4d70f-644b-4160-8a52-fc37a63f5def) due to inactivity
2022-10-19T09:18:07.712609Z INFO mz_persist_client::internal::maintenance: Force expiring writer (wfed998fe-ccdb-4d3a-aa53-75985fc407eb) of shard (sd7a4d70f-644b-4160-8a52-fc37a63f5def) due to inactivity
thread 'timely:work-0' panicked at 'assertion failed: PartialOrder::less_equal(&self.since, &new_since)', src/storage/src/source/reclock.rs:537:9
stack backtrace:
0: rust_begin_unwind
at ./rustc/a55dd71d5fb0ec5a6a3a9e8c27b2127ba491ce52/library/std/src/panicking.rs:584:5
1: core::panicking::panic_fmt
at ./rustc/a55dd71d5fb0ec5a6a3a9e8c27b2127ba491ce52/library/core/src/panicking.rs:142:14
2: core::panicking::panic
at ./rustc/a55dd71d5fb0ec5a6a3a9e8c27b2127ba491ce52/library/core/src/panicking.rs:48:5
3: mz_storage::source::reclock::ReclockOperator::compact::{{closure}}
at ./var/lib/buildkite-agent/builds/buildkite-builders-d43b1b5-i-01110d61e963703c7-1/materialize/tests/src/storage/src/source/reclock.rs:537:9
4: <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll
at ./rustc/a55dd71d5fb0ec5a6a3a9e8c27b2127ba491ce52/library/core/src/future/mod.rs:91:19
5: mz_storage::source::source_reader_pipeline::remap_operator::{{closure}}::{{closure}}
at ./var/lib/buildkite-agent/builds/buildkite-builders-d43b1b5-i-01110d61e963703c7-1/materialize/tests/src/storage/src/source/source_reader_pipeline.rs:966:70
6: <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll
at ./rustc/a55dd71d5fb0ec5a6a3a9e8c27b2127ba491ce52/library/core/src/future/mod.rs:91:19
7: <futures_util::future::future::fuse::Fuse<Fut> as core::future::future::Future>::poll
at ./cargo/registry/src/github.com-1ecc6299db9ec823/futures-util-0.3.24/src/future/future/fuse.rs:86:37
8: <core::pin::Pin<P> as core::future::future::Future>::poll
at ./rustc/a55dd71d5fb0ec5a6a3a9e8c27b2127ba491ce52/library/core/src/future/future.rs:124:9
9: <timely::dataflow::operators::generic::builder_rc::OperatorBuilder<G> as mz_timely_util::operators_async_ext::OperatorBuilderExt<G>>::build_async::{{closure}}::{{closure}}
at ./var/lib/buildkite-agent/builds/buildkite-builders-d43b1b5-i-01110d61e963703c7-1/materialize/tests/src/timely-util/src/operators_async_ext.rs:165:43
10: timely::dataflow::operators::generic::builder_rc::OperatorBuilder<G>::build_reschedule::{{closure}}
at ./cargo/git/checkouts/timely-dataflow-4c0cc365061cd263/9548bf9/timely/src/dataflow/operators/generic/builder_rc.rs:162:26
11: <timely::dataflow::operators::generic::builder_raw::OperatorCore<T,L> as timely::scheduling::Schedule>::schedule
at ./cargo/git/checkouts/timely-dataflow-4c0cc365061cd263/9548bf9/timely/src/dataflow/operators/generic/builder_raw.rs:204:9
12: timely::progress::subgraph::PerOperatorState<T>::schedule
at ./cargo/git/checkouts/timely-dataflow-4c0cc365061cd263/9548bf9/timely/src/progress/subgraph.rs:684:30
13: timely::progress::subgraph::Subgraph<TOuter,TInner>::activate_child
at ./cargo/git/checkouts/timely-dataflow-4c0cc365061cd263/9548bf9/timely/src/progress/subgraph.rs:344:26
14: <timely::progress::subgraph::Subgraph<TOuter,TInner> as timely::scheduling::Schedule>::schedule
at ./cargo/git/checkouts/timely-dataflow-4c0cc365061cd263/9548bf9/timely/src/progress/subgraph.rs:310:17
15: timely::progress::subgraph::PerOperatorState<T>::schedule
at ./cargo/git/checkouts/timely-dataflow-4c0cc365061cd263/9548bf9/timely/src/progress/subgraph.rs:684:30
16: timely::progress::subgraph::Subgraph<TOuter,TInner>::activate_child
at ./cargo/git/checkouts/timely-dataflow-4c0cc365061cd263/9548bf9/timely/src/progress/subgraph.rs:344:26
17: <timely::progress::subgraph::Subgraph<TOuter,TInner> as timely::scheduling::Schedule>::schedule
at ./cargo/git/checkouts/timely-dataflow-4c0cc365061cd263/9548bf9/timely/src/progress/subgraph.rs:310:17
18: timely::worker::Wrapper::step::{{closure}}
at ./cargo/git/checkouts/timely-dataflow-4c0cc365061cd263/9548bf9/timely/src/worker.rs:754:57
19: core::option::Option<T>::map
at ./rustc/a55dd71d5fb0ec5a6a3a9e8c27b2127ba491ce52/library/core/src/option.rs:929:29
20: timely::worker::Wrapper::step
at ./cargo/git/checkouts/timely-dataflow-4c0cc365061cd263/9548bf9/timely/src/worker.rs:754:26
21: timely::worker::Worker<A>::step_or_park
at ./cargo/git/checkouts/timely-dataflow-4c0cc365061cd263/9548bf9/timely/src/worker.rs:396:38
22: mz_storage::storage_state::Worker<A>::run_client
at ./var/lib/buildkite-agent/builds/buildkite-builders-d43b1b5-i-01110d61e963703c7-1/materialize/tests/src/storage/src/storage_state.rs:134:17
23: mz_storage::storage_state::Worker<A>::run
at ./var/lib/buildkite-agent/builds/buildkite-builders-d43b1b5-i-01110d61e963703c7-1/materialize/tests/src/storage/src/storage_state.rs:110:33
24: mz_storage::protocol::server::serve::{{closure}}
at ./var/lib/buildkite-agent/builds/buildkite-builders-d43b1b5-i-01110d61e963703c7-1/materialize/tests/src/storage/src/protocol/server.rs:94:9
25: timely::execute::execute::{{closure}}
at ./cargo/git/checkouts/timely-dataflow-4c0cc365061cd263/9548bf9/timely/src/execute.rs:287:22
26: timely_communication::initialize::initialize_from::{{closure}}
at ./cargo/git/checkouts/timely-dataflow-4c0cc365061cd263/9548bf9/communication/src/initialize.rs:316:33
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.
ha! this bug is different from the originally reported one. We changed how far we allow compaction in v0.28.0
and my gut feeling is that on restart we pick a "new since" that is slightly more in the past when just starting up. As long as the assert as_of >= since
on creation is upheld, we should be golden though. @guswynn wdyt, we could just turn this into a no-op when new_since
is less than since
?
also, cc @petrosagg wdyt?
I commented in the linked PR, but I don't think we should remove the assertion as well-behaving code would never trigger it. I would leave the assertion there until we understand what's going on