nocache icon indicating copy to clipboard operation
nocache copied to clipboard

turn off the cache for a directory

Open beroal opened this issue 8 years ago • 14 comments

This rather is a support request. Can I disable the cache for a specific directories? Benefit is as follows. A program transmits big files over a network and stores hosts' metadata in the metadata file. It's better not to disable cache for the metadata file.

beroal avatar Mar 18 '16 23:03 beroal

A more generic solution would be to introduce a method to either include (“whitelist”) or exclude (“blacklist”) certain glob patterns. The problem here is that the only way you can configure nocache’s behavior is by specifying environment variables, so this approach would mean you have to: think of two good variable names, parse their contents into an array or a list and then match every open() call against each expression. Feel free to implement this if you need it; I’ll take a look at it, but I currently don’t have the time to do it myself.

Feh avatar Mar 19 '16 00:03 Feh

The problem here is that the only way you can configure nocache’s behavior is by specifying environment variables

Why so? There are commands which accept options and a command, for example, "nice", "sudo", "env", "time", "xargs".

beroal avatar Mar 19 '16 10:03 beroal

True; but the functionality of nocache is achieved by the wrapper shell script setting the LD_PRELOAD env variable. The initializer of nocache.so is only called from the specified executable and has no access to command line arguments. See for example how the -n option is implemented.

Feh avatar Mar 19 '16 10:03 Feh

A more generic solution would be to introduce a method to either include (“whitelist”) or exclude (“blacklist”) certain glob patterns.

A wildcard never matches the pathname separator, so how do I specify all descendants of a directory by a glob pattern?

beroal avatar Mar 19 '16 11:03 beroal

Use fnmatch(3) without FNM_PATHNAME.

$ cat fnmatch.c
#include <fnmatch.h>
int main(int argc, char *argv[]) {
        return fnmatch("foo/*", argv[1], 0);
}
$ gcc -Wall -o fnmatch fnmatch.c
$ ./fnmatch foo/bar/baz && echo it matches
it matches

Feh avatar Mar 19 '16 11:03 Feh

Well, I implemented this feature request in my fork. I decided to use POSIX Extended Regular Expressions because they are more straightforward and more powerful than glob patterns. What do you think?

beroal avatar Mar 19 '16 15:03 beroal

Because the library remembers which pages (ie., 4K-blocks of the file) were already in file system cache when the file was opened, these will not be marked as "don't need", because other applications might need that, although they are not actively used (think: hot standby).

I don't understand this. Do you think that OS uses the last suggestion instead of joining suggestions from all processes?

beroal avatar Mar 19 '16 15:03 beroal

I’ve added some comments to your commit https://github.com/beroal/nocache/commit/c3956d384d04837dc33dc1756dd7e73754aae919

Do you think that OS uses the last suggestion instead of joining suggestions from all processes?

The reality is a bit more complicated, but in principle, yes. If process A reads file X completely it’s in the FS cache; if B now maps X and does an fadvise with “don’t need” on the file descriptor, the contents are evicted from the cache; subsequent reads of A from X will require going back to the storage medium to retrieve data.

In other words: Without this mechanism, you might evict files that are in active use, thereby impacting other processes.

Feh avatar Mar 19 '16 15:03 Feh

Then the Linux kernel is kind of stupid.

beroal avatar Mar 19 '16 16:03 beroal

Regarding maybe_store_pageinfo. All my additions contain cond or pattern. I group code by keywords. Other suggestions are implemented.

beroal avatar Mar 19 '16 16:03 beroal

Documentation. The cache is disabled for a file iff (I and not E) where I iff the file name satisfies the environment variable NOCACHE_PATTERN_INCLUDE (default: true), E iff the file name satisfies the environment variable NOCACHE_PATTERN_EXCLUDE (default: false). Both variables are treated as POSIX Extended Regular Expressions.

beroal avatar Mar 19 '16 16:03 beroal

I left some comments on https://github.com/beroal/nocache/commit/1e6061c9879b21f1d22607cc5d783f5abdf20a3f again.

Then the Linux kernel is kind of stupid.

Yes, and you’re welcome to improve it. The code is in mm/fadvise.c. Beware though that good and robust cache invalidation is one of the harder problems in programming.

Documentation.

Can you please add command line options to the nocache shell wrapper and add documentation to the Readme?

Feh avatar Mar 19 '16 23:03 Feh

I’d make explicit what you expect, i.e. if(regcomp(…) != NULL)

Look at the type of regcomp.

beroal avatar Mar 20 '16 09:03 beroal

Can you please add command line options to the nocache shell wrapper and add documentation to the Readme?

Sorry, I don't know the Bash programming language and I'm happy with that. ;-)

beroal avatar Mar 20 '16 09:03 beroal