Dropbox River for Elasticsearch (PROJECT STOPPED)

Welcome to the Dropbox River Plugin for Elasticsearch

This river plugin helps to index documents from your dropbox account.

WARNING: You need to have the Attachment Plugin.

Versions

Dropbox River Plugin	ElasticSearch	Attachment Plugin
master (0.2.0)	0.21.0.Beta1-SNAPSHOT	1.6.0
0.1.0	0.20.4	1.6.0

Build Status

Thanks to cloudbees for the build status :

Getting Started

Installation

Just type :

$ bin/plugin -install fr.pilato.elasticsearch.river/dropbox/0.1.0

This will do the job...

-> Installing fr.pilato.elasticsearch.river/dropbox/0.1.0...
Trying http://download.elasticsearch.org/fr.pilato.elasticsearch.river/dropbox/dropbox-0.1.0.zip...
Trying http://search.maven.org/remotecontent?filepath=fr/pilato/elasticsearch/river/dropbox/0.1.0/dropbox-0.1.0.zip...
Trying https://oss.sonatype.org/service/local/repositories/releases/content/fr/pilato/elasticsearch/river/dropbox/0.1.0/dropbox-0.1.0.zip...
Downloading ......DONE
Installed dropbox

Get Dropbox credentials (token and secret)

First, you need to create your own application in Dropbox Developers.

If you create a Full Dropbox application, you will have access to all folders.

If you create a App folder application, you will only have access to your app folder files. You will get Dropbox HTTP Error 403 : {"error": "Forbidden"} errors when accessing to other folders.

Note your AppKey and your AppSecret.

You need then to get an Authorization from the user for this new Application.

Just open the _dropbox REST Endpoint with your AppKey and AppSecret parameters: http://localhost:9200/_dropbox/oauth/AppKey/AppSecret

$ curl http://localhost:9200/_dropbox/oauth/AppKey/AppSecret

You will get back a URL:

{
  "oauth_token":"OAUTHTOKEN",
  "oauth_secret":"OAUTHSECRET",
  "url" : "https://www.dropbox.com/1/oauth/authorize?oauth_token=OAUTHTOKEN"
}

Open the URL in your browser. You will be asked by Dropbox to Allow your application to access to your dropbox account. If you have added to the url a oauth_callback parameter, Dropbox will redirect your user to this end point.

For example, https://www.dropbox.com/1/oauth/authorize?oauth_token=OAUTHTOKEN&oauth_callback=http://yourwebserver/callback will redirect your user to http://yourwebserver/callback if your user allows your application to have an access to its Dropbox folders.

Once you get back the success reply from Dropbox, you can get the user Token and Secret by calling

$ curl http://localhost:9200/_dropbox/oauth/apptoken/appsecret/OAUTHTOKEN/OAUTHSECRET

You will get back a JSON document like the following:

{
  "token" : "yourtoken",
  "secret" : "yoursecret"
}

You will just have to use it when you will create the river (see below).

By the way, you can use the SettingUpDropboxTestsCases test class to get a token and a secret for your user.

Creating a Dropbox river

We create first an index to store our documents (optional):

$ curl -XPUT 'localhost:9200/mydocs/' -d '{}'

We create the river with the following properties :

AppKey: AAAAAAAAAAAAAAAA
AppSecret: BBBBBBBBBBBBBBBB
Token: XXXXXXXXXXXXXXXX
Secret: YYYYYYYYYYYYYYYY
Dropbox directory URL : /tmp
Update Rate : every 15 minutes (15 * 60 * 1000 = 900000 ms)
Get only docs like *.doc and *.pdf
Don't index resume*

$ curl -XPUT 'localhost:9200/_river/mydocs/_meta' -d '{
  "type": "dropbox",
  "dropbox": {
    "appkey": "AAAAAAAAAAAAAAAA",
    "appsecret": "BBBBBBBBBBBBBBBB",
    "token": "XXXXXXXXXXXXXXXX",
    "secret": "YYYYYYYYYYYYYYYY",
	"name": "My tmp dropbox dir",
	"url": "/tmp",
	"update_rate": 900000,
	"includes": "*.doc,*.pdf",
	"excludes": "resume"
  }
}'

Adding another Dropbox river

We add another river with the following properties :

AppKey: AAAAAAAAAAAAAAAA
AppSecret: BBBBBBBBBBBBBBBB
Token: 2XXXXXXXXXXXXXXX
Secret: 2YYYYYYYYYYYYYYY
Dropbox directory URL : /tmp2
Update Rate : every hour (60 * 60 * 1000 = 3600000 ms)
Get only docs like *.doc, *.xls and *.pdf

By the way, we define to index in the same index/type as the previous one:

index: docs
type: doc

$ curl -XPUT 'localhost:9200/_river/mynewriver/_meta' -d '{
  "type": "dropbox",
  "dropbox": {
    "appkey": "AAAAAAAAAAAAAAAA",
    "appsecret": "BBBBBBBBBBBBBBBB",
    "token": "2XXXXXXXXXXXXXXX",
    "secret": "2YYYYYYYYYYYYYYY",
	"name": "My tmp2 dropbox dir",
	"url": "/tmp2",
	"update_rate": 3600000,
	"includes": [ "*.doc" , "*.xls", "*.pdf" ]
  },
  "index": {
  	"index": "mydocs",
  	"type": "doc",
  	bulk_size: 50
  }
}'

Note that you can index for another Dropbox Application (appkey and appsecret may be different than the previous river).

Note that you can use the same credentials (appkey, appsecret, token, secret) as the previous river if you only want to index another directory for the same user.

Searching for docs

This is a common use case in elasticsearch, we want to search for something ;-)

$ curl -XGET http://localhost:9200/docs/doc/_search -d '{
  "query" : {
    "match" : {
        "_all" : "I am searching for something !"
    }
  }
}'

Advanced

Autogenerated mapping

When the Dropbox detect a new type, it creates automatically a mapping for this type.

{
  "doc" : {
    "properties" : {
      "file" : {
        "type" : "attachment",
        "path" : "full",
        "fields" : {
          "file" : {
            "type" : "string",
            "store" : "yes",
            "term_vector" : "with_positions_offsets"
          },
          "author" : {
            "type" : "string"
          },
          "title" : {
            "type" : "string",
            "store" : "yes"
          },
          "name" : {
            "type" : "string"
          },
          "date" : {
            "type" : "date",
            "format" : "dateOptionalTime"
          },
          "keywords" : {
            "type" : "string"
          },
          "content_type" : {
            "type" : "string"
          }
        }
      },
      "name" : {
        "type" : "string",
        "analyzer" : "keyword"
      },
      "pathEncoded" : {
        "type" : "string",
        "analyzer" : "keyword"
      },
      "postDate" : {
        "type" : "date",
        "format" : "dateOptionalTime"
      },
      "rootpath" : {
        "type" : "string",
        "analyzer" : "keyword"
      },
      "virtualpath" : {
        "type" : "string",
        "analyzer" : "keyword"
      }
    }
  }
}

Creating your own mapping (analyzers)

If you want to define your own mapping to set analyzers for example, you can push the mapping before starting the Dropbox River.

{
  "doc" : {
    "properties" : {
      "file" : {
        "type" : "attachment",
        "path" : "full",
        "fields" : {
          "file" : {
            "type" : "string",
            "store" : "yes",
            "term_vector" : "with_positions_offsets",
            "analyzer" : "french"
          },
          "author" : {
            "type" : "string"
          },
          "title" : {
            "type" : "string",
            "store" : "yes"
          },
          "name" : {
            "type" : "string"
          },
          "date" : {
            "type" : "date",
            "format" : "dateOptionalTime"
          },
          "keywords" : {
            "type" : "string"
          },
          "content_type" : {
            "type" : "string"
          }
        }
      },
      "name" : {
        "type" : "string",
        "analyzer" : "keyword"
      },
      "pathEncoded" : {
        "type" : "string",
        "analyzer" : "keyword"
      },
      "postDate" : {
        "type" : "date",
        "format" : "dateOptionalTime"
      },
      "rootpath" : {
        "type" : "string",
        "analyzer" : "keyword"
      },
      "virtualpath" : {
        "type" : "string",
        "analyzer" : "keyword"
      }
    }
  }
}

To send mapping to Elasticsearch, refer to the Put Mapping API

Meta fields

Dropbox River creates some meta fields :

Field	Description	Example
name	Original file name	mydocument.pdf
pathEncoded	BASE64 encoded file path (for internal use)	112aed83738239dbfe4485f024cd4ce1
postDate	Indexing date	1312893360000
rootpath	BASE64 encoded root path (for internal use)	112aed83738239dbfe4485f024cd4ce1
virtualpath	Relative path	mydir/otherdir

Advanced search

You can use meta fields to perform search on.

$ curl -XGET http://localhost:9200/docs/doc/_search -d '{
  "query" : {
    "term" : {
        "name" : "mydocument.pdf"
    }
  }
}'

Behind the scene

How it works ?

TO BE COMPLETED

License

This software is licensed under the Apache 2 license, quoted below.

Copyright 2011-2013 David Pilato

Licensed under the Apache License, Version 2.0 (the "License"); you may not
use this file except in compliance with the License. You may obtain a copy of
the License at

    http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
License for the specific language governing permissions and limitations under
the License.

dropbox
dropbox copied to clipboard

Metadata

Dropbox River for Elasticsearch (PROJECT STOPPED)

Versions

Build Status

Getting Started

Installation

Get Dropbox credentials (token and secret)

Creating a Dropbox river

Adding another Dropbox river

Searching for docs

Advanced

Autogenerated mapping

Creating your own mapping (analyzers)

Meta fields

Advanced search

Behind the scene

How it works ?

License

← Metadata

Owner

Metadata

dropbox dropbox copied to clipboard

Metadata

Dropbox River for Elasticsearch (PROJECT STOPPED)

Versions

Build Status

Getting Started

Installation

Get Dropbox credentials (token and secret)

Creating a Dropbox river

Adding another Dropbox river

Searching for docs

Advanced

Autogenerated mapping

Creating your own mapping (analyzers)

Meta fields

Advanced search

Behind the scene

How it works ?

License

← Metadata

Owner

Metadata

dropbox
dropbox copied to clipboard