dropbox
dropbox copied to clipboard
Dropbox River for Elasticsearch (PROJECT STOPPED)
Dropbox River for Elasticsearch (PROJECT STOPPED)
Welcome to the Dropbox River Plugin for Elasticsearch
This river plugin helps to index documents from your dropbox account.
WARNING: You need to have the Attachment Plugin.
Versions
| Dropbox River Plugin | ElasticSearch | Attachment Plugin |
| master (0.2.0) | 0.21.0.Beta1-SNAPSHOT | 1.6.0 |
| 0.1.0 | 0.20.4 | 1.6.0 |
Build Status
Thanks to cloudbees for the build status :
Getting Started
Installation
Just type :
$ bin/plugin -install fr.pilato.elasticsearch.river/dropbox/0.1.0
This will do the job...
-> Installing fr.pilato.elasticsearch.river/dropbox/0.1.0...
Trying http://download.elasticsearch.org/fr.pilato.elasticsearch.river/dropbox/dropbox-0.1.0.zip...
Trying http://search.maven.org/remotecontent?filepath=fr/pilato/elasticsearch/river/dropbox/0.1.0/dropbox-0.1.0.zip...
Trying https://oss.sonatype.org/service/local/repositories/releases/content/fr/pilato/elasticsearch/river/dropbox/0.1.0/dropbox-0.1.0.zip...
Downloading ......DONE
Installed dropbox
Get Dropbox credentials (token and secret)
First, you need to create your own application in Dropbox Developers.
If you create a Full Dropbox application, you will have access to all folders.
If you create a App folder application, you will only have access to your app folder files. You will get Dropbox HTTP Error 403 : {"error": "Forbidden"} errors when accessing to other folders.
Note your AppKey and your AppSecret.
You need then to get an Authorization from the user for this new Application.
Just open the _dropbox REST Endpoint with your AppKey and AppSecret parameters: http://localhost:9200/_dropbox/oauth/AppKey/AppSecret
$ curl http://localhost:9200/_dropbox/oauth/AppKey/AppSecret
You will get back a URL:
{
"oauth_token":"OAUTHTOKEN",
"oauth_secret":"OAUTHSECRET",
"url" : "https://www.dropbox.com/1/oauth/authorize?oauth_token=OAUTHTOKEN"
}
Open the URL in your browser. You will be asked by Dropbox to Allow your application to access to your dropbox account.
If you have added to the url a oauth_callback parameter, Dropbox will redirect your user to this end point.
For example,
https://www.dropbox.com/1/oauth/authorize?oauth_token=OAUTHTOKEN&oauth_callback=http://yourwebserver/callback will
redirect your user to http://yourwebserver/callback if your user allows your application to have an access to its
Dropbox folders.
Once you get back the success reply from Dropbox, you can get the user Token and Secret by calling
$ curl http://localhost:9200/_dropbox/oauth/apptoken/appsecret/OAUTHTOKEN/OAUTHSECRET
You will get back a JSON document like the following:
{
"token" : "yourtoken",
"secret" : "yoursecret"
}
You will just have to use it when you will create the river (see below).
By the way, you can use the SettingUpDropboxTestsCases test class to get a token and a secret for your user.
Creating a Dropbox river
We create first an index to store our documents (optional):
$ curl -XPUT 'localhost:9200/mydocs/' -d '{}'
We create the river with the following properties :
- AppKey: AAAAAAAAAAAAAAAA
- AppSecret: BBBBBBBBBBBBBBBB
- Token: XXXXXXXXXXXXXXXX
- Secret: YYYYYYYYYYYYYYYY
- Dropbox directory URL :
/tmp - Update Rate : every 15 minutes (15 * 60 * 1000 = 900000 ms)
- Get only docs like
*.docand*.pdf - Don't index
resume*
$ curl -XPUT 'localhost:9200/_river/mydocs/_meta' -d '{
"type": "dropbox",
"dropbox": {
"appkey": "AAAAAAAAAAAAAAAA",
"appsecret": "BBBBBBBBBBBBBBBB",
"token": "XXXXXXXXXXXXXXXX",
"secret": "YYYYYYYYYYYYYYYY",
"name": "My tmp dropbox dir",
"url": "/tmp",
"update_rate": 900000,
"includes": "*.doc,*.pdf",
"excludes": "resume"
}
}'
Adding another Dropbox river
We add another river with the following properties :
- AppKey: AAAAAAAAAAAAAAAA
- AppSecret: BBBBBBBBBBBBBBBB
- Token: 2XXXXXXXXXXXXXXX
- Secret: 2YYYYYYYYYYYYYYY
- Dropbox directory URL :
/tmp2 - Update Rate : every hour (60 * 60 * 1000 = 3600000 ms)
- Get only docs like
*.doc,*.xlsand*.pdf
By the way, we define to index in the same index/type as the previous one:
- index:
docs - type:
doc
$ curl -XPUT 'localhost:9200/_river/mynewriver/_meta' -d '{
"type": "dropbox",
"dropbox": {
"appkey": "AAAAAAAAAAAAAAAA",
"appsecret": "BBBBBBBBBBBBBBBB",
"token": "2XXXXXXXXXXXXXXX",
"secret": "2YYYYYYYYYYYYYYY",
"name": "My tmp2 dropbox dir",
"url": "/tmp2",
"update_rate": 3600000,
"includes": [ "*.doc" , "*.xls", "*.pdf" ]
},
"index": {
"index": "mydocs",
"type": "doc",
bulk_size: 50
}
}'
Note that you can index for another Dropbox Application (appkey and appsecret may be different
than the previous river).
Note that you can use the same credentials (appkey, appsecret, token, secret) as
the previous river if you only want to index another directory for the same user.
Searching for docs
This is a common use case in elasticsearch, we want to search for something ;-)
$ curl -XGET http://localhost:9200/docs/doc/_search -d '{
"query" : {
"match" : {
"_all" : "I am searching for something !"
}
}
}'
Advanced
Autogenerated mapping
When the Dropbox detect a new type, it creates automatically a mapping for this type.
{
"doc" : {
"properties" : {
"file" : {
"type" : "attachment",
"path" : "full",
"fields" : {
"file" : {
"type" : "string",
"store" : "yes",
"term_vector" : "with_positions_offsets"
},
"author" : {
"type" : "string"
},
"title" : {
"type" : "string",
"store" : "yes"
},
"name" : {
"type" : "string"
},
"date" : {
"type" : "date",
"format" : "dateOptionalTime"
},
"keywords" : {
"type" : "string"
},
"content_type" : {
"type" : "string"
}
}
},
"name" : {
"type" : "string",
"analyzer" : "keyword"
},
"pathEncoded" : {
"type" : "string",
"analyzer" : "keyword"
},
"postDate" : {
"type" : "date",
"format" : "dateOptionalTime"
},
"rootpath" : {
"type" : "string",
"analyzer" : "keyword"
},
"virtualpath" : {
"type" : "string",
"analyzer" : "keyword"
}
}
}
}
Creating your own mapping (analyzers)
If you want to define your own mapping to set analyzers for example, you can push the mapping before starting the Dropbox River.
{
"doc" : {
"properties" : {
"file" : {
"type" : "attachment",
"path" : "full",
"fields" : {
"file" : {
"type" : "string",
"store" : "yes",
"term_vector" : "with_positions_offsets",
"analyzer" : "french"
},
"author" : {
"type" : "string"
},
"title" : {
"type" : "string",
"store" : "yes"
},
"name" : {
"type" : "string"
},
"date" : {
"type" : "date",
"format" : "dateOptionalTime"
},
"keywords" : {
"type" : "string"
},
"content_type" : {
"type" : "string"
}
}
},
"name" : {
"type" : "string",
"analyzer" : "keyword"
},
"pathEncoded" : {
"type" : "string",
"analyzer" : "keyword"
},
"postDate" : {
"type" : "date",
"format" : "dateOptionalTime"
},
"rootpath" : {
"type" : "string",
"analyzer" : "keyword"
},
"virtualpath" : {
"type" : "string",
"analyzer" : "keyword"
}
}
}
}
To send mapping to Elasticsearch, refer to the Put Mapping API
Meta fields
Dropbox River creates some meta fields :
| Field | Description | Example |
| name | Original file name | mydocument.pdf |
| pathEncoded | BASE64 encoded file path (for internal use) | 112aed83738239dbfe4485f024cd4ce1 |
| postDate | Indexing date | 1312893360000 |
| rootpath | BASE64 encoded root path (for internal use) | 112aed83738239dbfe4485f024cd4ce1 |
| virtualpath | Relative path | mydir/otherdir |
Advanced search
You can use meta fields to perform search on.
$ curl -XGET http://localhost:9200/docs/doc/_search -d '{
"query" : {
"term" : {
"name" : "mydocument.pdf"
}
}
}'
Behind the scene
How it works ?
TO BE COMPLETED
License
This software is licensed under the Apache 2 license, quoted below.
Copyright 2011-2013 David Pilato
Licensed under the Apache License, Version 2.0 (the "License"); you may not
use this file except in compliance with the License. You may obtain a copy of
the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
License for the specific language governing permissions and limitations under
the License.