nuster icon indicating copy to clipboard operation
nuster copied to clipboard

Replication of cache data to other nodes

Open shannonantony opened this issue 5 years ago • 20 comments

Implement replicating nuster node cache to other nodes

shannonantony avatar Jan 21 '19 06:01 shannonantony

That would be great, in fact, it's in the todo list. I'll work on this after HAProxy v1.9 migration, some cache headers and disk persistence implementation. The progress is slow as I don't have much spare time as I used to.

jiangwenyuan avatar Jan 21 '19 10:01 jiangwenyuan

Thanks for your quick response.

Will you be using peers for the data syncing

Sent from my iPhone

On Jan 21, 2019, at 5:59 AM, Wenyuan Jiang [email protected] wrote:

That would be great, in fact, it's in the todo list. I'll work on this after HAProxy v1.9 migration and disk persistence implementation. The progress is slow as I don't have much spare time as I used to.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or mute the thread.

shannonantony avatar Jan 21 '19 16:01 shannonantony

Please let me know if you need any testing help. I could be the first tester. thanks Jaison

On Mon, Jan 21, 2019 at 11:48 AM Jay antony [email protected] wrote:

Thanks for your quick response.

Will you be using peers for the data syncing

Sent from my iPhone

On Jan 21, 2019, at 5:59 AM, Wenyuan Jiang [email protected] wrote:

That would be great, in fact, it's in the todo list. I'll work on this after HAProxy v1.9 migration and disk persistence implementation. The progress is slow as I don't have much spare time as I used to.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/jiangwenyuan/nuster/issues/35#issuecomment-456033721, or mute the thread https://github.com/notifications/unsubscribe-auth/ABV4aAKQC4GUOm6H50fHLOJu3pZhTmzkks5vFZ2lgaJpZM4aKKYv .

shannonantony avatar Jan 21 '19 20:01 shannonantony

Hi, haven't looked into peers yet, that would be great if I can just use the sync feature from peers.

jiangwenyuan avatar Jan 22 '19 01:01 jiangwenyuan

Meanwhile, what I'm doing to "solve" this issue is by using AWS EFS which allows me to have several nodes/instances/containers pointing to the same data volume for caching. So regardless of how many containers I have going up or down they are always sharing the same volume for caching which is great. 👍

igorescobar avatar May 23 '20 14:05 igorescobar

I am trying to deploy nuster in a "cluster" (multiple containers) configuration, and the only problem I am having is that while they share the cache in a shared disk and not in memory, the disk loader doesn't synchronize the cache cross the different containers. Is there a way that I can force the disk loader to constantly ensure all containers have the updated cache?

hugos99 avatar Jul 30 '20 10:07 hugos99

@HugoS99 What's your setup? a local disk directory mounted in multiple containers?

jiangwenyuan avatar Jul 30 '20 11:07 jiangwenyuan

@jiangwenyuan It's a Kubernetes Cluster with a shared NFS folder across all containers and it resolves to a single shared folder and multiple containers accessing it

hugos99 avatar Jul 30 '20 11:07 hugos99

@HugoS99 Should work then. disk loader only loads the cache files metainfo into memory on startup, it does not sync nor update cache files. Since you are using shared NFS folder, then all containers are seeing the same content, so there's no need to sync. If container 1 creates a cache file in the shared NFS folder, container 2 can use that file to serve the identical request. Make sure you are using memory off disk on

jiangwenyuan avatar Jul 30 '20 11:07 jiangwenyuan

Yeah I know but what happens is since the load process only occurs on startup and is not checking constantly for new cached data what happens is that if a pod dies all other living pods don't "know" all the data that that one pod cached and as such they perform the request again

This could be solved if the disk load thread was constantly operating trying to read the data every X nanoseconds

For reference, I am using memory off disk on as to ensure that the cache is disk only

@jiangwenyuan I made a fork of the repository with an ugly fix for this problem check it out at https://github.com/HugoS99/nuster/blob/master/src/nuster/store/disk.c I basically just make the thread constantly operate

hugos99 avatar Jul 30 '20 11:07 hugos99

@jiangwenyuan Want me to create a pull request with my ugly fix?

hugos99 avatar Jul 30 '20 11:07 hugos99

@HugoS99 You are right, and I now understand why #82 high cache MISS issue happens in multiple containers setup.

When the loader is done, nuster does not check disk anymore and only relies on memory, so if another container creates a new file, it does not know.

Thanks for the proposal, but constantly loading does not solve this problem(because it may happen before next round loading) and is not necessary in single nuster setup.

I probably will add a new mode for multiple nuster setup.

jiangwenyuan avatar Jul 30 '20 11:07 jiangwenyuan

I agree that a new "cluster" mode should be created, but I believe the constant read is the most generic and more natural way to facilitate this feature the best solution would be to allow the user to set the time between round loading you may see my implementation for reference (where each cycle takes 300ms) simply allow a user to set that value with the clarification that there may be missed caches with this setup, but a user can simply put 0 to ensure the best possible synchronization.

hugos99 avatar Jul 30 '20 12:07 hugos99

nst_disk_load does not load all files but several files. If you have like 10M files, it will cause much time to complete the load. And it will generate lots IO.

jiangwenyuan avatar Jul 30 '20 12:07 jiangwenyuan

So a method that compared all the files already loaded with the not loaded is required with that the solution would be valid right?

hugos99 avatar Jul 30 '20 12:07 hugos99

Yeah. Currently a simplest solution pops up into my head is a new mode that always check disk if does not exists in memory.

So currently the logic is (for disk on memory off):

  • check memory(does disk.file exist?)
    • yes, use disk.file
    • no, loaded?
      • no, check disk (does x/xx/uuid file exist?)
      • yes, goto backend

So the new mode is:

  • check memory(does disk.file exist?)
    • yes, use disk.file
    • no, loaded?
      • no, check disk (does x/xx/uuid file exist?)
      • yes and (new mode), check disk (does x/xx/uuid file exist?)
      • else, goto backend

jiangwenyuan avatar Jul 30 '20 12:07 jiangwenyuan

That looks like a great simple solution! If you tell the file where that logic is performed I can try and implement an initial draft solution!

hugos99 avatar Jul 30 '20 12:07 hugos99

Cool! Thanks a lot! I'll update this comment later.

@HugoS99 After some thinking, I found that this always-check should be put in global instead of rule level as the disk applies for all rules.

So my thought is

a new mode in https://github.com/jiangwenyuan/nuster#global-nuster-cachenosql

nuster cache on|off [data-size size] [dict-size size] [dir DIR] [dict-cleaner n] [data-cleaner n] [disk-cleaner n] [disk-loader n] [disk-saver n] [clean-temp on|off] [disk-always-check on|off]

nuster nosql on|off [data-size size] [dict-size size] [dir DIR] [dict-cleaner n] [data-cleaner n] [disk-cleaner n] [disk-loader n] [disk-saver n] [clean-temp on|off] [disk-always-check on|off]

disk-always-check ? shared-disk-mode ? if you have better words:)

Refer to clean-temp, add a new var(say disk_always_check) in https://github.com/jiangwenyuan/nuster/blob/master/include/haproxy/global-t.h#L196 (and nosql)

and here https://github.com/jiangwenyuan/nuster/blob/master/src/haproxy.c#L196 (and nosql)

add parser: https://github.com/jiangwenyuan/nuster/blob/master/src/nuster/parser.c#L642 (and nosql)

and https://github.com/jiangwenyuan/nuster/blob/master/src/nuster/cache/engine.c#L810 https://github.com/jiangwenyuan/nuster/blob/master/src/nuster/nosql/engine.c#L1020

        if(!disk->loaded || disk-always-check) {

Probably that's all for the check.

But you might need to check the code contains loaded like https://github.com/jiangwenyuan/nuster/blob/master/src/nuster/cache/engine.c#L984

to make sure it works in multiple containers mode when we delete the cache.

jiangwenyuan avatar Jul 30 '20 12:07 jiangwenyuan

Hey @jiangwenyuan just finished the initial implementation the only thing missing is the delete portion I didn't understand the code so I thought I should ask you do you want a PR for this?

hugos99 avatar Jul 30 '20 15:07 hugos99

@HugoS99 Sure, thanks

jiangwenyuan avatar Jul 31 '20 00:07 jiangwenyuan