mychembl
mychembl copied to clipboard
New notebook - using mongoDB for chemistry.
There is a blog post by Matt Swain about (http://blog.matt-swain.com/post/87093745652/chemical-similarity-search-in-mongodb) similarity searching in mongoDB. NoSQL approaches are more and more popular, we already had some interns doing some work in this area.
We should include mongoDB in myChEMBL and provide some notebook describing some basic usage:
- loading chembl sdf to mongo
- performing similarity search
In addition to Matt research we should investigate whether calculating Levenstein distance on fingerprints inside Mongo (like here: https://gitorious.org/infos-pratiques/podis/source/2cec14c0dd632324a07cdfd7bf9e68422a20423a:) can speed thing up. Another idea would be to keep every single bit in a different field, making indexes on all fields and checking it this can be any better.
Apart from chemistry, we already have our own django caching backend, which usese MongoDB se we can use it and demonstrate how to use it to others.
Great idea