gravitino icon indicating copy to clipboard operation
gravitino copied to clipboard

[FEATURE] Add JuiceFS support in Fileset

Open theoryxu opened this issue 1 year ago • 6 comments

Describe the feature

Fileset is a new concept introduced in 0.5.0 to manage non-tabular data; the current implementation uses HCFS to manage physical data. Now, HCFS doesn't support JuiceFS.

In this issue, we should discuss: how to support JuiceFS in Fileset and how to achieve it.

Motivation

JuiceFS is a high-performance, cloud-native distributed file system that is developing rapidly. Support of this could help Gravitino to be used in more scenarios in the future

Describe the solution

No response

Additional context

No response

theoryxu avatar Aug 05 '24 08:08 theoryxu

As far as I know, JuiceFS community version provides Hadoop SDK, please refer to: https://juicefs.com/docs/zh/community/hadoop_java_sdk. So I think JuiceFS can be supported on Fileset using Hadoop SDK like S3.

xloya avatar Aug 05 '24 08:08 xloya

as @xloya mentions, JuiceFS is compatible with HDFS API via its Java SDK and also supports S3 API (ref). But I highly recommend Gravitino support POSIX for all the generic file systems, including JuiceFS, Lustre, CephFS, and more.

Suave avatar Aug 05 '24 12:08 Suave

@Suave Is Hadoop SDK a better choice for big data scenarios?

2005hithlj avatar Aug 06 '24 07:08 2005hithlj

@theoryxu thanks for creating this issue. What I'm curious is, what are the pain points or challenges when using Juicefs, and using Gravitino Fileset can overcome or solve? If you can share some of them, that will be good for others to understand this feature. Thank you!

shaofengshi avatar Aug 08 '24 08:08 shaofengshi

@Suave Is Hadoop SDK a better choice for big data scenarios?

Yes, JuiceFS Java SDK works better in Hadoop ecosystem, it's compatible with Hadoop 2.x and 3.x both

Suave avatar Aug 08 '24 22:08 Suave

now it still does not support JuiceFS through jfs provider or any other provider. This is the reproduction step:

  1. put juicefs-hadoop-1.2.3.jar to /root/gravitino/catalogs/hadoop/libs/
  2. create core-site.xml as below and put it into /root/gravitino/catalogs/hadoop/conf/core-site.xml and /etc/hive/conf/core-site.xml, set env HADOOP_CONF_DIR = /etc/hive/conf:
<configuration>
     <property>
        <name>fs.defaultFS</name>
        <value>jfs://dev03</value>
     </property>
     <property>
        <name>fs.jfs.impl</name>
        <value>io.juicefs.JuiceFileSystem</value>
    </property>
    <property>
        <name>fs.AbstractFileSystem.jfs.impl</name>
        <value>io.juicefs.JuiceFS</value>
    </property>
    <property>
        <name>juicefs.meta</name>
        <value>redis://xxxxxx/3</value>
    </property>
    <property>
        <name>juicefs.memory-size</name>
        <value>1024</value>
    </property>
</configuration>
  1. restart gravitino
  2. create catalog with filesystem-providers as jfs, location as jfs://dev03/tmp/
  3. open the catalog, and it throw error as below:{"code":1006,"type":"UnsupportedOperationException","message":"Failed to operate schema(s) operation [LIST] under catalog [testsss], reason [File system providers [jfs] not found in the classpath. Please make sure the file system provider is in the classpath.]","stack":["java.lang.UnsupportedOperationException: File system providers [jfs] not found in the classpath. Please make sure the file system provider is in the classpath.",at org.apache.gravitino.catalog.hadoop.fs.FileSystemUtils.getFileSystemProviders(FileSystemUtils.java:84)",at org.apache.gravitino.catalog.hadoop.HadoopCatalogOperations.initialize(HadoopCatalogOperations.java:147)", at org.apache.gravitino.catalog.hadoop.SecureHadoopCatalogOperations.initialize(SecureHadoopCatalogOperations.java:92)", at org.apache.gravitino.connector.BaseCatalog.ops(BaseCatalog.java:172)", at org.apache.gravitino.catalog.CatalogManager$CatalogWrapper.asSchemas(CatalogManager.java:241)", at org.apache.gravitino.catalog.CatalogManager$CatalogWrapper.lambda$doWithSchemaOps$0(CatalogManager.java:145)", at org.apache.gravitino.utils.IsolatedClassLoader.withClassLoader(IsolatedClassLoader.java:86)", at org.apache.gravitino.catalog.CatalogManager$CatalogWrapper.doWithSchemaOps(CatalogManager.java:143)", at org.apache.gravitino.catalog.SchemaOperationDispatcher.lambda$listSchemas$1(SchemaOperationDispatcher.java:78)", at org.apache.gravitino.catalog.OperationDispatcher.doWithCatalog(OperationDispatcher.java:100)", at org.apache.gravitino.catalog.SchemaOperationDispatcher.listSchemas(SchemaOperationDispatcher.java:76)", at org.apache.gravitino.hook.SchemaHookDispatcher.listSchemas(SchemaHookDispatcher.java:54)", at org.apache.gravitino.catalog.SchemaNormalizeDispatcher.listSchemas(SchemaNormalizeDispatcher.java:48)", at org.apache.gravitino.listener.SchemaEventDispatcher.listSchemas(SchemaEventDispatcher.java:77)", at org.apache.gravitino.server.web.rest.SchemaOperations.lambda$listSchemas$0(SchemaOperations.java:91)", at org.apache.gravitino.lock.TreeLockUtils.doWithTreeLock(TreeLockUtils.java:49)", at org.apache.gravitino.server.web.rest.SchemaOperations.lambda$listSchemas$1(SchemaOperations.java:88)", at java.base/java.security.AccessController.doPrivileged(AccessController.java:712)", at java.base/javax.security.auth.Subject.doAs(Subject.java:439)", at org.apache.gravitino.utils.PrincipalUtils.doAs(PrincipalUtils.java:39)", at org.apache.gravitino.server.web.Utils.doAs(Utils.java:188)", at org.apache.gravitino.server.web.rest.SchemaOperations.listSchemas(SchemaOperations.java:83)", at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)", at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)", at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)", at java.base/java.lang.reflect.Method.invoke(Method.java:568)", at org.glassfish.jersey.server.model.internal.ResourceMethodInvocationHandlerFactory.lambda$static$0(ResourceMethodInvocationHandlerFactory.java:52)", at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher$1.run(AbstractJavaResourceMethodDispatcher.java:146)", at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher.invoke(AbstractJavaResourceMethodDispatcher.java:189)", at org.glassfish.jersey.server.model.internal.JavaResourceMethodDispatcherProvider$ResponseOutInvoker.doDispatch(JavaResourceMethodDispatcherProvider.java:176)", at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher.dispatch(AbstractJavaResourceMethodDispatcher.java:93)", at org.glassfish.jersey.server.model.ResourceMethodInvoker.invoke(ResourceMethodInvoker.java:478)", at org.glassfish.jersey.server.model.ResourceMethodInvoker.apply(ResourceMethodInvoker.java:400)", at org.glassfish.jersey.server.model.ResourceMethodInvoker.apply(ResourceMethodInvoker.java:81)", at org.glassfish.jersey.server.ServerRuntime$1.run(ServerRuntime.java:256)", at org.glassfish.jersey.internal.Errors$1.call(Errors.java:248)", at org.glassfish.jersey.internal.Errors$1.call(Errors.java:244)", at org.glassfish.jersey.internal.Errors.process(Errors.java:292)", at org.glassfish.jersey.internal.Errors.process(Errors.java:274)", at org.glassfish.jersey.internal.Errors.process(Errors.java:244)", at org.glassfish.jersey.process.internal.RequestScope.runInScope(RequestScope.java:265)", at org.glassfish.jersey.server.ServerRuntime.process(ServerRuntime.java:235)", at org.glassfish.jersey.server.ApplicationHandler.handle(ApplicationHandler.java:684)", at org.glassfish.jersey.servlet.WebComponent.serviceImpl(WebComponent.java:394)", at org.glassfish.jersey.servlet.WebComponent.service(WebComponent.java:346)", at org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:358)", at org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:311)", at org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:205)", at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:799)", at org.eclipse.jetty.servlet.ServletHandler$ChainEnd.doFilter(ServletHandler.java:1656)", at org.apache.gravitino.server.authentication.AuthenticationFilter.doFilter(AuthenticationFilter.java:86)", at org.eclipse.jetty.servlet.FilterHolder.doFilter(FilterHolder.java:193)", at org.eclipse.jetty.servlet.ServletHandler$Chain.doFilter(ServletHandler.java:1626)", at org.apache.gravitino.server.web.VersioningFilter.doFilter(VersioningFilter.java:111)", at org.eclipse.jetty.servlet.FilterHolder.doFilter(FilterHolder.java:193)", at org.eclipse.jetty.servlet.ServletHandler$Chain.doFilter(ServletHandler.java:1626)", at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:552)", at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)", at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:600)", at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)", at org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:235)", at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1624)", at org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:233)", at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1440)", at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:188)", at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:505)", at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1594)", at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:186)", at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1355)", at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)", at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:146)", at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)", at org.eclipse.jetty.server.Server.handle(Server.java:516)", at org.eclipse.jetty.server.HttpChannel.lambda$handle$1(HttpChannel.java:487)", at org.eclipse.jetty.server.HttpChannel.dispatch(HttpChannel.java:732)", at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:479)", at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:277)", at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:311)", at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:105)", at org.eclipse.jetty.io.ChannelEndPoint$1.run(ChannelEndPoint.java:104)", at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:338)", at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:315)", at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:173)", at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:131)", at org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:409)", at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:883)", at org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:1034)", at java.base/java.lang.Thread.run(Thread.java:833)"]}

Image

Li-GL avatar Apr 16 '25 02:04 Li-GL