Implementing Multi-Container Capability
I've done a bit of work to enable the ability to run multiple instances of the container pointing to the same set of watch directories using shared volumes. The driver for this change is that I wanted to be able to run this image as a service within a Docker Swarm cluster (or k8s, I don't believe there is a major difference in philosophy there) and be able to scale up to do multiple conversions at the same time.
I'm using flock under the covers to lock a hidden file to act as a mutex file, which prevents multiple concurrent processes for interacting with the file at the same time. This allows the container to safely "claim" a file from the watch directory and others to wait until they can get a lock on the file to try and claim the next file in the loop or exit the processing loop with no work to be done.
I have tested the implementation with great success running:
- multiple containers on the same host
- one container each on two separate VMs, hosted on the same server (using glusterfs as a distributed file system)
- one container each on two separate VMs, hosted on separate servers (using glusterfs as a distributed file system)
This is pretty early on, but I wanted to make sure this was aligned with the project owner before completing the work.
Remaining work, in my opinion:
- feature flag to disable the capability (or possible the inverse)
- a mechanism for tracking containers that no longer exist and releasing claimed files
@jlesage I wasn't sure how to add you to the pull request as a reviewer (first pull request on Github), so just tagging you here for my own peace of mind. No rush on review/feedback.
Thank you for this. I have a few comments:
flockis not supported by all filesystems. This can be a problem.- Why the lock occurs on
/config? As per my understanding,/configis not the shared folder./watchwould be the one. Sharing the same/configfolder across multiple containers can be a problem, because it contains per-instance things, like log, cache, etc (seelogandxdgfolders).
Regarding flock, I will try to do some investigations. Do you know of any filesystems that do not support flock? I'll do some digging but if you knew of one that would help jumpstart my research. I'm a bit of a novice with some of the low-level details (I'm a Java guy 😄). I recall seeing something about using a directory as a locking mechanism, so I can explore that as time allows.
With regards to the lock-tracking file and the mutex file being in the /config directory; I was following the pattern for the successful and failed processing files. I'm a little unsure what you mean by "not the shared folder" as the use case here is running the container as a service a la kubernetes or docker swarm, so the mounts are shared across the replicas of the service - thus making all of the mounts shared.
I'm running this right now as a Docker Swarm service with two replicas on different VMs and have not had any issues with per-instance caching or logging. The processes both write to the same files, which may be a bit confusing as it stands right now where the logging framework is not prepending the machine information to the logs.. but I have not found it to be a problem as of yet. I have not attempted using the web/vnc interface though so that could be where the implementation runs into problems.
So in the context of multiple instances sharing all of the mounts, including the config mount, what would your recommendation be here? I'd be glad to continue to iterate as time permits (assuming that you find value in this feature to begin with).
Hey. I am in a need to achieve the very same thing.. how do you stand with this PR? I plan to order a 10 VM instances in a cloud and have it connect to my home server via NFS and perform conversion remotely.. Thanks!
@mattwahl can we get together and come up with a feasible solution? I have couple ideas and need to compile it into best possible outcome for this use-case. Thanks
as it stands right now the solution works, but as the owner of the repo laid out there are some file systems that may not support flock. I can say that ubuntu and MacOS work with the solution as is. I don't have a mechanism of testing other operating systems and have been unable to devote any additional time to it (hard to believe it's been over a year!). I will say that prior to merge there is a small defect where the file gets claimed as it is being copied, so the tracking file will see transient hashes until the transfer has been completed. It has not impacted me at all - just a nuisance - but one that I would want to see resolved before this would get merged in for others to consume.
I think the 2 points I raised are still valid:
- The
flockproblem is not about OSes, but about the variety of file systems used in the wild. For example, I'm pretty sure network shares, like NFS and SMB, fail to implement this functionality. I think a network share is a good choice people would want to use when implementing a multi-container setup (like @aronmgv plan to do). - Using the same
/configacross multiple containers is dangerous and not future-proof. This folder is not designed to be shared across multiple instances since it contains container-specific data.
That being said, other strategies can be used:
- For 1), folder creation can be used as a locking mechanism to replace
flock. - For 2), I think the lock should be done on the watch folder instead. The watch folder is the one shared across multiple container instances. However, this implies 2 things:
- The watch folder requires to have write permission.
AUTOMATED_CONVERSION_KEEP_SOURCE=0is mandatory: removing the source files once processed is required to make sure other instances don't process the same file again. The other solution would be to keep locks in place in the watch folder, but I'm not sure it's better (watch folder pollution).
I started playing with that and will see how it goes.
Well explained, thanks! In such scenario I do remove source from the watch directory - it is meant to be - you can probably hardcode it when multicontainer capability is enabled. Also might be important to realize when using trash it will be located within the particular watch folder (This is due to docker volume restrictions where it would take loooong time to copy source video file over NFS to a global trash (since the docker environment is remotely located the source would have to travel back and forth)). And other containers need this trash folder available for writing.
Anyway let me just share my plan.. I want to have multiple remote handbrake containers which will point to the volumes locally mounted via NFS:
volumes:
# CONFIG
- $PWD/config:/config:rw
# LOGS - Those container specific stats can be handled here and mounted explicitly.
# Probably there needs to be implemented logic where user does not have to create
# them manually on the host.. dont remember if I had to or not..
- /mnt/caradhras/ADV/.stats/${hostname}_conversion.log:/config/log/hb/conversion.log:rw
- /mnt/caradhras/ADV/.stats/${hostname}_failed_conversions:/config/failed_conversions:rw
- /mnt/caradhras/ADV/.stats/${hostname}_successful_conversions:/config/successful_conversions:rw
# STORAGE
# OUTPUT FOLDER
- /mnt/caradhras/ADV/@DONE:/output:rw
# TRASH FOLDER
- /mnt/caradhras/ADV/.trash:/trash:rw
# WATCH FOLDERS - in my case I have 30 lol
- /mnt/caradhras/ADV/${MKV_2160P_ULTRA}:/watch:rw
- /mnt/caradhras/ADV/${MKV_2160P_HIGH}:/watch2:rw
- /mnt/caradhras/ADV/${MKV_2160P_MEDIUM}:/watch3:rw
- /mnt/caradhras/ADV/${MKV_2160P_LOW}:/watch4:rw
- /mnt/caradhras/ADV/${MKV_1080P_ULTRA}:/watch5:rw
Watch folder part:
- AUTOMATED_CONVERSION_PRESET_2=Matroska/${MKV_2160P_HIGH}
- AUTOMATED_CONVERSION_FORMAT_2=mkv
- AUTOMATED_CONVERSION_USE_TRASH_2=1
- AUTOMATED_CONVERSION_KEEP_SOURCE_2=0
- AUTOMATED_CONVERSION_TRASH_DIR_2=/watch2/.trash
- AUTOMATED_CONVERSION_OUTPUT_DIR_2=/output
- AUTOMATED_CONVERSION_OUTPUT_SUBDIR_2=SAME_AS_SRC
@jlesage Hey. Sorry to bother you, can you just share ETA here? Feel free to involve me if needed! Thanks!
Thank you for the reminder. I pushed changes for this. Expect a new release very soon :)
Can't wait!! Thanks
Just tested it using 2 remote servers doing conversion over NFS and it WORKS without any problem! @jlesage many thanks! 🥰
Great, thank you for the feedback! I guess we can now close this PR.
Appreciate the time and effort @jlesage. Sorry I didn't have the bandwidth to see it through. I do have a question about one of your comments regarding the config volume, but I'll move that to the discussions section. Again, 🙏 thank you for all the great work with this project.
@jlesage I have a suggestion here.. I noticed that the lock check is done after waiting a default 5 seconds time.. This is a bit annoying and time waste when you have 10 containers doing the conversion. Imagine a new container is started -10th one. It has same picking logic as others - so it will try to take same video files as others tried - so first 9 videos are already being converted and thus are locked.. thus it waits 9 x 5 seconds before finding a first unlocked video file to process..
Processing watch folder '/watch28'...
Waiting 5 seconds before processing '/watch28/AAAAAAAAAAAAAAA.wmv'...
Skipping '/watch28/AAAAAAAAAAAAAAA': currently beging processed by another instance.
Waiting 5 seconds before processing '/watch28/BBBBBBBBBBBBBBB.wmv'...
Skipping '/watch28/BBBBBBBBBBBBBBB.wmv': currently beging processed by another instance.
...
Waiting 5 seconds before processing '/watch28/JJJJJJJJJJJJJJJ.wmv'...
Starting conversion of '/watch28/JJJJJJJJJJJJJJJ.wmv' (c3d6a529fcb00ade18ae9b162101970f) using preset 'General/MP4.0p.HIGH'...
1 title(s) to process.
Executing pre-conversion hook...
Can this logic be swapped? So it firstly ignores videos already locked and then waits 5 seconds? or whatever the timer is configured for.. Thanks a lot!
Yes, I will adjust that.
@jlesage I have another proposition here for better workflow.. simply purge ignored_conversion.log at container startup.. this creates problem when a container doing a conversion of a certain file will never start again.. such a video file will not get converted and will be ignored by others.. thus just clearing this will ensure at next startup file will get converted by other container..
However this requires restart of some other container.. hmm, do we really need it? Isn't just a lock check enough? Dont think it is big hassle for a container to go through all the locks every time it wants to pick a file for conversion.. what do you think? Thanks!
@jlesage ^, thanks!