tantivy icon indicating copy to clipboard operation
tantivy copied to clipboard

Read only mode

Open Rigellute opened this issue 6 years ago • 14 comments

My problem is that I am attempting to deploy tantivy to an environement which has a read-only file system.

Creating the index as follows

let index = Index::open_in_dir("my-index-path").unwrap();

results in this error

"Read-only file system"

It seems that if .tantivy-meta.lock is read only, creating the index gives this error. All other files in the index directory can be read-only, however.

Describe the solution you'd like One solution might be an option to open the index in read only mode - you cannot commit or merge any new documents at runtime.

Perhaps this could avoid the need for acquiring locks and so on?

I am happy to build the index locally/on CI first and then deploy the index as a static file.

(The data I want to search through will change very infrequently).

Rigellute avatar Jun 02 '19 17:06 Rigellute

I took the liberty to remove the part about fuzzy search because it was off-topic.

Thanks for the report. That's indeed super important.

fulmicoton avatar Jun 03 '19 03:06 fulmicoton

This is a great issue. I wanted to add some ideas and see, if they are useful to users and implementors of this ticket.

I imagine this to be a frequent use case. it's important to get it right, so it's easy for applications to use tantivy and work on static indexes.

Interface

  • [ ] Add a "read-only" feature to Cargo.toml
  • [ ] Mark all writing interfaces with macros to exclude them, when compiling with the read-only feature.

Testing

  • [ ] Add an example for a read-only index application
  • [ ] Set up a read-only docker container to test the example. @Rigellute, please share your Dockerfile, if you can.

petr-tik avatar Jun 03 '19 14:06 petr-tik

Having a readonly feature flag is not necessarily a bad idea, but I don't think this should be the solution for this issue.

As a simple work around for @Rigellute , I suggest we can simply let people define an IndexReader that has a ReadOnly ``ReloadPolicy` mode that does not lock the directory.

Alternatively, we could also consider ignore the lock acquisition optional, and go on with loading the index.

  • If loading the index success, just accept it.
  • If loading the index fails (e.g. one file got garbage collected, as we were opening the files because we did not have the lock), return and error.

If so we need to somehow append the information that the lock file could not be acquired and the reason to the Error. I think this would require refactoring tantivy error's quite a bit.

fulmicoton avatar Jun 04 '19 00:06 fulmicoton

Thanks for the responses!

@fulmicoton your suggested workaround sounds good for my usecase.

Is your IndexReader workaround already implemented? Or is that something that would still need to be worked on?

Rigellute avatar Jun 04 '19 16:06 Rigellute

@Rigellute

No fix is implemented at this point, but there will probably be something in the next version of tantivy.

In the meanwhile, you can solve your problem in tantivy-0.9 by creating a custom directory that wraps the MMapDirectory.

Simply forward all of the method in the Directory to the MMapDirectory you wrap except acquire_lock for which you can simply return Ok(Box::new(()))

fulmicoton avatar Jun 06 '19 01:06 fulmicoton

I would like to turn what I have here into your suggested workaround, however, I am not exactly sure how.

let index = Index::open_in_dir("my-index").unwrap();
index.tokenizers().register(
    "commoncrawl",
    SimpleTokenizer
        .filter(RemoveLongFilter::limit(40))
        .filter(LowerCaser)
        .filter(AlphaNumOnlyFilter)
        .filter(Stemmer::new(Language::English)),
);
let schema = index.schema();
let default_fields: Vec<Field> = schema
    .fields()
    .iter()
    .enumerate()
    .filter(|&(_, ref field_entry)| match *field_entry.field_type() {
        FieldType::Str(ref text_field_options) => {
            text_field_options.get_indexing_options().is_some()
        }
        _ => false,
    })
    .map(|(i, _)| Field(i as u32))
    .collect();
let query_parser = QueryParser::new(schema.clone(), default_fields, index.tokenizers().clone());
let reader = index.reader().unwrap();

// Start querying!

Are you able to suggest how to convert this code into your workaround?

Simply forward all of the method in the Directory to the MMapDirectory you wrap except acquire_lock for which you can simply return Ok(Box::new(()))

Rigellute avatar Jun 09 '19 09:06 Rigellute

You code would look like something like that. (this is just a rough outtline don't expect the code below to compile as is. I am typing directly from github's editor)


use tantivy::directory::{Directory, MMapDirectory, DirectoryLock};

pub struct ReadOnlyDirectoryWrapper(pub MmapDirectory);

impl Directory for ReadOnlyDirectoryWrapper {

   // 
   
    fn acquire_lock(&self, lock: &Lock) -> Result<DirectoryLock, LockError> {
            Ok(DirectoryLock::from(Box::new(())))
    }
    
   // ... just delegate all of the methods to MMapDriectory
   ...

}

fn main() -> tantivy::Result<()> {
   let mmap_directory = MMapDirectory::open("my-index")?;
   let  index = Index::open(ReadOnlyDirectoryWrapper(mmap_directory))?;
   // the rest of your code...
}

fulmicoton avatar Jun 09 '19 11:06 fulmicoton

Ah I see! Thank you. Have managed to get it working in the read only filesystem 🎉 ! In case others run into this issue, here is the code I used to get this workaround working

#[derive(Debug, Clone)]
pub struct ReadOnlyDirectoryWrapper {
    inner: MmapDirectory,
}

struct HasDrop;

impl Drop for HasDrop {
    fn drop(&mut self) {
        println!("Dropping!");
    }
}

impl Directory for ReadOnlyDirectoryWrapper {
    fn acquire_lock(&self, _lock: &Lock) -> Result<DirectoryLock, LockError> {
        Ok(DirectoryLock::from(Box::new(HasDrop)))
    }

    fn open_read(&self, path: &Path) -> result::Result<ReadOnlySource, OpenReadError> {
        MmapDirectory::open_read(&self.inner, path)
    }

    fn delete(&self, path: &Path) -> result::Result<(), DeleteError> {
        MmapDirectory::delete(&self.inner, path)
    }

    fn exists(&self, path: &Path) -> bool {
        MmapDirectory::exists(&self.inner, path)
    }

    fn open_write(&mut self, path: &Path) -> Result<WritePtr, OpenWriteError> {
        MmapDirectory::open_write(&mut self.inner, path)
    }

    fn atomic_read(&self, path: &Path) -> Result<Vec<u8>, OpenReadError> {
        MmapDirectory::atomic_read(&self.inner, path)
    }

    fn atomic_write(&mut self, path: &Path, data: &[u8]) -> io::Result<()> {
        MmapDirectory::atomic_write(&mut self.inner, path, data)
    }

    fn watch(&self, watch_callback: WatchCallback) -> WatchHandle {
        MmapDirectory::watch(&self.inner, watch_callback)
    }
}

...

let mmap_directory = MmapDirectory::open("my-index").unwrap();
let index = Index::open(ReadOnlyDirectoryWrapper {
  inner: mmap_directory,
})
.unwrap();
// Off you go

Rigellute avatar Jun 10 '19 10:06 Rigellute

@Rigellute Good job.

I suggest you also call unreachable! instead of delegating the writing operations. atomic_write and open_write. Does not matter muhc if you are confident your directory and files are readonly though.

fulmicoton avatar Jun 12 '19 00:06 fulmicoton

Hey @Rigellute, I would like to a) give you and others an easy interface for read-only index work b) add error handling that makes it easy for others to spot the same problems in the future.

Reading through your bug report and the source code.

Creating the index as follows

let index = Index::open_in_dir("my-index-path").unwrap();

results in this error

"Read-only file system"

My ideas so far are the following: Index::open_in_dir first opens an MmapDirectory, then passes this MmapDirectory to Index::open to create an Index. Reading the source for MmapDirectory::open I don't see where we can throw such an error.

However, Index::open calls ManagedDirectory::wrap inside of which we call match directory.atomic_read the last line of which does very minimal error handling. I suspect the line below is responsible for throwing the error.

https://github.com/tantivy-search/tantivy/blob/e2da92fcb588b465c337ab0f465741a8696c0756/src/directory/managed_directory.rs#L87

Can you please provide me with a case for repro? I would like to step through it with a debugger, if possible. What version of tantivy did you build? Can you please give me a representative index directory structure (Docker container that includes FS restrictions)?

petr-tik avatar Jun 15 '19 13:06 petr-tik

It seems that if .tantivy-meta.lock is read only, creating the index gives this error. All other files in the index directory can be read-only, however.

I think you are right. .tantivy-meta.lock is acquired in garbage_collect and reload methods. I think if we exclude those methods from read-only version of tantivy, we can prevent other problems with the lock.

@fulmicoton

* If loading the index fails (e.g. one file got garbage collected, as we were opening the files because we did not have the lock), return and error.

If so we need to somehow append the information that the lock file could not be acquired and the reason to the Error. I think this would require refactoring tantivy error's quite a bit.

If the index is read only can files ever be garbage collected?

I am all for adding relevant information to the Error type, however I think excluding locks, index writers and write/delete methods from directories, index traits might make it easier for everyone.

petr-tik avatar Jun 15 '19 13:06 petr-tik

A read-only directory cannot be garbage collected no.

fulmicoton avatar Jun 17 '19 01:06 fulmicoton

@Rigellute

I have been trying to reproduce the bug according to your description.

It seems that if .tantivy-meta.lock is read only, creating the index gives this error. All other files in the index directory can be read-only, however.

In my experiments [0][1], I ran a small index-reading application in a read-only FS (mounted on Docker) with a .tantivy-meta.lock stripped of write and execute permissions. The "Read-only file system" error message doesn't reproduce.

0 - http://petr-tik.github.io/posts/permissions_arent_mounts/ 1 - http://petr-tik.github.io/posts/dockerise_an_ro_filesystem/

I will work on read-only mode based on my understanding of tantivy.

I am not sure, if the feature will be usable in your application, since I have not been able to reproduce the environment and bug.

petr-tik avatar Jun 18 '19 22:06 petr-tik

Is this still planned? I'd also like to use tantivy for a static index which is served from a readonly fs.

marioreggiori avatar Oct 22 '25 06:10 marioreggiori