[FEATURE] Add JuiceFS support in Fileset
Describe the feature
Fileset is a new concept introduced in 0.5.0 to manage non-tabular data; the current implementation uses HCFS to manage physical data. Now, HCFS doesn't support JuiceFS.
In this issue, we should discuss: how to support JuiceFS in Fileset and how to achieve it.
Motivation
JuiceFS is a high-performance, cloud-native distributed file system that is developing rapidly. Support of this could help Gravitino to be used in more scenarios in the future
Describe the solution
No response
Additional context
No response
As far as I know, JuiceFS community version provides Hadoop SDK, please refer to: https://juicefs.com/docs/zh/community/hadoop_java_sdk. So I think JuiceFS can be supported on Fileset using Hadoop SDK like S3.
as @xloya mentions, JuiceFS is compatible with HDFS API via its Java SDK and also supports S3 API (ref). But I highly recommend Gravitino support POSIX for all the generic file systems, including JuiceFS, Lustre, CephFS, and more.
@Suave Is Hadoop SDK a better choice for big data scenarios?
@theoryxu thanks for creating this issue. What I'm curious is, what are the pain points or challenges when using Juicefs, and using Gravitino Fileset can overcome or solve? If you can share some of them, that will be good for others to understand this feature. Thank you!
@Suave Is Hadoop SDK a better choice for big data scenarios?
Yes, JuiceFS Java SDK works better in Hadoop ecosystem, it's compatible with Hadoop 2.x and 3.x both
now it still does not support JuiceFS through jfs provider or any other provider. This is the reproduction step:
- put
juicefs-hadoop-1.2.3.jarto/root/gravitino/catalogs/hadoop/libs/ - create core-site.xml as below and put it into
/root/gravitino/catalogs/hadoop/conf/core-site.xmland/etc/hive/conf/core-site.xml, set envHADOOP_CONF_DIR = /etc/hive/conf:
<configuration>
<property>
<name>fs.defaultFS</name>
<value>jfs://dev03</value>
</property>
<property>
<name>fs.jfs.impl</name>
<value>io.juicefs.JuiceFileSystem</value>
</property>
<property>
<name>fs.AbstractFileSystem.jfs.impl</name>
<value>io.juicefs.JuiceFS</value>
</property>
<property>
<name>juicefs.meta</name>
<value>redis://xxxxxx/3</value>
</property>
<property>
<name>juicefs.memory-size</name>
<value>1024</value>
</property>
</configuration>
- restart gravitino
- create catalog with
filesystem-providersasjfs, location asjfs://dev03/tmp/ - open the catalog, and it throw error as below:{"code":1006,"type":"UnsupportedOperationException","message":"Failed to operate schema(s) operation [LIST] under catalog [testsss], reason [File system providers [jfs] not found in the classpath. Please make sure the file system provider is in the classpath.]","stack":["java.lang.UnsupportedOperationException: File system providers [jfs] not found in the classpath. Please make sure the file system provider is in the classpath.",at org.apache.gravitino.catalog.hadoop.fs.FileSystemUtils.getFileSystemProviders(FileSystemUtils.java:84)",at org.apache.gravitino.catalog.hadoop.HadoopCatalogOperations.initialize(HadoopCatalogOperations.java:147)", at org.apache.gravitino.catalog.hadoop.SecureHadoopCatalogOperations.initialize(SecureHadoopCatalogOperations.java:92)", at org.apache.gravitino.connector.BaseCatalog.ops(BaseCatalog.java:172)", at org.apache.gravitino.catalog.CatalogManager$CatalogWrapper.asSchemas(CatalogManager.java:241)", at org.apache.gravitino.catalog.CatalogManager$CatalogWrapper.lambda$doWithSchemaOps$0(CatalogManager.java:145)", at org.apache.gravitino.utils.IsolatedClassLoader.withClassLoader(IsolatedClassLoader.java:86)", at org.apache.gravitino.catalog.CatalogManager$CatalogWrapper.doWithSchemaOps(CatalogManager.java:143)", at org.apache.gravitino.catalog.SchemaOperationDispatcher.lambda$listSchemas$1(SchemaOperationDispatcher.java:78)", at org.apache.gravitino.catalog.OperationDispatcher.doWithCatalog(OperationDispatcher.java:100)", at org.apache.gravitino.catalog.SchemaOperationDispatcher.listSchemas(SchemaOperationDispatcher.java:76)", at org.apache.gravitino.hook.SchemaHookDispatcher.listSchemas(SchemaHookDispatcher.java:54)", at org.apache.gravitino.catalog.SchemaNormalizeDispatcher.listSchemas(SchemaNormalizeDispatcher.java:48)", at org.apache.gravitino.listener.SchemaEventDispatcher.listSchemas(SchemaEventDispatcher.java:77)", at org.apache.gravitino.server.web.rest.SchemaOperations.lambda$listSchemas$0(SchemaOperations.java:91)", at org.apache.gravitino.lock.TreeLockUtils.doWithTreeLock(TreeLockUtils.java:49)", at org.apache.gravitino.server.web.rest.SchemaOperations.lambda$listSchemas$1(SchemaOperations.java:88)", at java.base/java.security.AccessController.doPrivileged(AccessController.java:712)", at java.base/javax.security.auth.Subject.doAs(Subject.java:439)", at org.apache.gravitino.utils.PrincipalUtils.doAs(PrincipalUtils.java:39)", at org.apache.gravitino.server.web.Utils.doAs(Utils.java:188)", at org.apache.gravitino.server.web.rest.SchemaOperations.listSchemas(SchemaOperations.java:83)", at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)", at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)", at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)", at java.base/java.lang.reflect.Method.invoke(Method.java:568)", at org.glassfish.jersey.server.model.internal.ResourceMethodInvocationHandlerFactory.lambda$static$0(ResourceMethodInvocationHandlerFactory.java:52)", at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher$1.run(AbstractJavaResourceMethodDispatcher.java:146)", at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher.invoke(AbstractJavaResourceMethodDispatcher.java:189)", at org.glassfish.jersey.server.model.internal.JavaResourceMethodDispatcherProvider$ResponseOutInvoker.doDispatch(JavaResourceMethodDispatcherProvider.java:176)", at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher.dispatch(AbstractJavaResourceMethodDispatcher.java:93)", at org.glassfish.jersey.server.model.ResourceMethodInvoker.invoke(ResourceMethodInvoker.java:478)", at org.glassfish.jersey.server.model.ResourceMethodInvoker.apply(ResourceMethodInvoker.java:400)", at org.glassfish.jersey.server.model.ResourceMethodInvoker.apply(ResourceMethodInvoker.java:81)", at org.glassfish.jersey.server.ServerRuntime$1.run(ServerRuntime.java:256)", at org.glassfish.jersey.internal.Errors$1.call(Errors.java:248)", at org.glassfish.jersey.internal.Errors$1.call(Errors.java:244)", at org.glassfish.jersey.internal.Errors.process(Errors.java:292)", at org.glassfish.jersey.internal.Errors.process(Errors.java:274)", at org.glassfish.jersey.internal.Errors.process(Errors.java:244)", at org.glassfish.jersey.process.internal.RequestScope.runInScope(RequestScope.java:265)", at org.glassfish.jersey.server.ServerRuntime.process(ServerRuntime.java:235)", at org.glassfish.jersey.server.ApplicationHandler.handle(ApplicationHandler.java:684)", at org.glassfish.jersey.servlet.WebComponent.serviceImpl(WebComponent.java:394)", at org.glassfish.jersey.servlet.WebComponent.service(WebComponent.java:346)", at org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:358)", at org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:311)", at org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:205)", at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:799)", at org.eclipse.jetty.servlet.ServletHandler$ChainEnd.doFilter(ServletHandler.java:1656)", at org.apache.gravitino.server.authentication.AuthenticationFilter.doFilter(AuthenticationFilter.java:86)", at org.eclipse.jetty.servlet.FilterHolder.doFilter(FilterHolder.java:193)", at org.eclipse.jetty.servlet.ServletHandler$Chain.doFilter(ServletHandler.java:1626)", at org.apache.gravitino.server.web.VersioningFilter.doFilter(VersioningFilter.java:111)", at org.eclipse.jetty.servlet.FilterHolder.doFilter(FilterHolder.java:193)", at org.eclipse.jetty.servlet.ServletHandler$Chain.doFilter(ServletHandler.java:1626)", at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:552)", at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)", at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:600)", at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)", at org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:235)", at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1624)", at org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:233)", at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1440)", at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:188)", at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:505)", at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1594)", at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:186)", at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1355)", at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)", at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:146)", at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)", at org.eclipse.jetty.server.Server.handle(Server.java:516)", at org.eclipse.jetty.server.HttpChannel.lambda$handle$1(HttpChannel.java:487)", at org.eclipse.jetty.server.HttpChannel.dispatch(HttpChannel.java:732)", at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:479)", at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:277)", at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:311)", at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:105)", at org.eclipse.jetty.io.ChannelEndPoint$1.run(ChannelEndPoint.java:104)", at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:338)", at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:315)", at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:173)", at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:131)", at org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:409)", at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:883)", at org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:1034)", at java.base/java.lang.Thread.run(Thread.java:833)"]}