crawlee icon indicating copy to clipboard operation
crawlee copied to clipboard

Revisit API of storages

Open janbuchar opened this issue 1 year ago • 5 comments

There are multiple considerations:

  • We may not need the global accessors (e.g., Dataset.open)
  • Can we support multiple non-named ("default") storages?
  • We should cleanly separate filesystem-backed and memory-only storage

janbuchar avatar Sep 24 '24 08:09 janbuchar

We may not need the global accessors (e.g., Dataset.open)

The open methods are actually the only ones we might need.

We should cleanly separate filesystem-backed and memory-only storage

I am still not sure about this, we still need a way to offload things from memory, otherwise, a long-running scape would fail at some point.

B4nan avatar Sep 24 '24 08:09 B4nan

We should cleanly separate filesystem-backed and memory-only storage

I am still not sure about this, we still need a way to offload things from memory, otherwise, a long-running scape would fail at some point.

As long as the memory-only thing is opt-in and the default is filesystem-backed, I'd argue we don't :shrug:

janbuchar avatar Sep 24 '24 08:09 janbuchar

I see, so the default would be the file system one. Then I am not sure what benefits this would bring, pretty much all the problems we had were about the file system part. And that implementation would still require a lot of in-memory caching most likely, I would even say it would end up as the same as the current memory storage.

B4nan avatar Sep 24 '24 09:09 B4nan

I'd personally be opposed to making fs-backed default, especially for requests (as we've seen the amount of troubles that causes, especially during concurrent accesses)... But we can definitely optimize some parts further I think


Can we support multiple non-named ("default") storages?

Should work with current memory-storage, does it not? memory-storage definitely supports it to my knowledge (.s.getOrCreate() -> new UUID)

vladfrangu avatar Sep 24 '24 09:09 vladfrangu

Should work with current memory-storage, does it not? memory-storage definitely supports it to my knowledge (.s.getOrCreate() -> new UUID)

This would work out of box already if we remove the storage manager cache, its not about the storage implementations, its about the fact that you calling open() twice resolves to the same default storage.

B4nan avatar Sep 24 '24 09:09 B4nan