CompoundDb icon indicating copy to clipboard operation
CompoundDb copied to clipboard

Support multiple SDF files

Open stanstrup opened this issue 8 years ago • 4 comments

For example for pubchem.

Multithreading with pbapply would be nice.

See also https://github.com/EuracBiomedicalResearch/CompoundDb/issues/1#issuecomment-340341955

stanstrup avatar Nov 04 '17 20:11 stanstrup

If compound_tbl_sdf was internal to createCompDb (so you'd always call createCompDb directly) you could append the sqlite file instead to avoid the memory requirements. This was what I did in my approach for pubchem.

stanstrup avatar Nov 05 '17 19:11 stanstrup

Note: createCompDb does already support to generate a CompDb from multiple input files. The man page does also tell you that you can provide the name(s) of the file(s). I will make it more clearly in the help page. So far I used lapply to process multiple files - I'll switch to bplapply.

jorainer avatar Nov 06 '17 05:11 jorainer

OK, I have extended the documentation a little. I've also tried to enable parallel processing, but that's not possible because SQLite/RSQLite does not support concurrent write operations. I've also tried: https://stackoverflow.com/questions/36831302/parallel-query-of-sqlite-database-in-r and https://www.r-bloggers.com/synchronization-for-r-with-the-flock-package/ but that didn't help either. So, presently it's not possible.

jorainer avatar Nov 06 '17 08:11 jorainer

Ah yes I tried the exact same things. That's why I ended up doing an sqlite for each SDF and then constructing the final sqlite after the parallel runs.

stanstrup avatar Nov 06 '17 08:11 stanstrup