Cache Hadoop Filesystem instance on Gravitino server to improve the performance
Currently , all Gravitino File system providers use the following code
(Take HDFSFileSystemProvider for example)
FileSystem.instance will always create a new Filesystem everytime even though they are the same mostly. In fact Hadoop FileSystem did have cache mechanism, If we use FileSystem.get, cache mechanism in FileSystem will works. Due to the fact the Gravitino virtual FileSystem (GVFS) client also shares FileSystemProviders and supports credentials for each unique path, we should be cautious when planning to enable cache in the file system. in all
- In Gravitno server side, we can enable cache in FileSystem level
- In GVFS, we need to disable it FileSystem level and cache file system instacen in GVFS level
@yuqi1129 please assign it to me.
Of course, I will send it to you if you are interested in this issue.
Of course, I will send it to you if you are interested in this issue.
ok, thanks
Good catch!