M2 icon indicating copy to clipboard operation
M2 copied to clipboard

open database files

Open DanGrayson opened this issue 7 years ago • 13 comments

Loading a package results in opening its raw documentation database file and leaving it open for future access. But the default number of open files for a process can be as small as 256, and we have something close to 170 packages now, so loading them all, just to see what's in them, as is done to generate a list of all the packages with their headlines for the documentation, can result in using a lot of file descriptors.

It might be better to wait until the documentation for the loaded package is needed and then to load the entire database into memory, if the number of nodes is small enough. (Macaulay2Doc has more than 5000 nodes, so we want not to load it into memory.)

DanGrayson avatar Apr 17 '18 18:04 DanGrayson

Maybe the thing to do is to invent a datatype that implements a FIFO queue, to contain the open database files. Each time one is used, remove it and add it to the queue again. Each time the queue gets to a size of 200 or so, remove the first one and close it. Each time a database is encountered that is closed, reopen it and add it to the queue.

DanGrayson avatar May 23 '19 17:05 DanGrayson

An LRU cache might be a better choice. Where should this go?

rz137 avatar May 24 '19 02:05 rz137

Yes, that's exactly the term for what I described.

The database file is stored in an object of class Package under the key "raw documentation database", so searching for that string in the files in the directory M2/Macaulay2/m2 will locate all the uses of those databases.

We have 193 packages now.

DanGrayson avatar May 24 '19 14:05 DanGrayson

This might be a silly question, but why do we store documentation in databases? Why not just text files?

mahrud avatar Apr 01 '20 19:04 mahrud

Speed.

DanGrayson avatar Apr 01 '20 22:04 DanGrayson

gdbm:

i4 : time help Macaulay2Doc
     -- used 0.0410353 seconds
...

vs. man:

[mahrud@noether ~]$ time man bash
...
sys	0m0.124s

Is 0.06s worth the effort?

mahrud avatar Jun 05 '20 18:06 mahrud

I don't understand the point of your timing comparison -- we don't have man pages for the Macaulay2 documentation.

DanGrayson avatar Jun 05 '20 19:06 DanGrayson

The point is that just reading from a file for each documentation node is just as fast.

To be clear, I'm not suggesting this is what we should do and I can't even do proper experiments to compare or just go in and fix this now because I can't make sense of the code and all the places that databases pop up. This is just in response to you question about what experiments I did that tell me the speedup is not significant.

mahrud avatar Jun 05 '20 19:06 mahrud

Okay -- so the speedup might be significant, after all. I'll do a proper comparison. It would be great if the speedup were insignficant now.

DanGrayson avatar Jun 05 '20 19:06 DanGrayson

I just wanted to chime in and point out that (1) the speed up is not insignificant — the 0.083s absolute difference translates to 3x slowdown; (2) but on that scale and in that context it's probably irrelevant; (3) however, the original problem remains and it seems much easier to just insert an LRU cache instead of reworking documentation organization.

rz137 avatar Jun 05 '20 23:06 rz137

The size of man bash was about 100x that of help "Macaulay2Doc", which is actually nonexistant .. bad examples.

mahrud avatar Jun 05 '20 23:06 mahrud

Is there a disadvantage to having a single database instead of one per package?

mahrud avatar Aug 19 '20 10:08 mahrud

That's a great idea!

To distinguish the items from various packages, one should prepend the name of the package to the documentation key -- that will be straightforward.

Then one has to decide what to do with the packages installed by the user in the user's application directory. Have a database for all of them, I guess. Same for any other directory where the user installs packages, and for any directory on the prefixPath:

i2 : stack prefixPath

o2 = /Users/dan/Library/Application Support/Macaulay2/local/
     /Users/dan/src/M2/M2.git/M2/BUILD/dan/builds.tmp/einsteinium-development/usr-dist/

The routine uninstallPackage could remove the appropriate entries, too, since the name of the package is a prefix to the key.

The prefixPath is short, so it doesn't matter if those database files stay open all the time.

DanGrayson avatar Aug 19 '20 11:08 DanGrayson