pxf icon indicating copy to clipboard operation
pxf copied to clipboard

Named query and MongoDB

Open yv5125 opened this issue 4 years ago • 6 comments

Hello! We successfully using PXF with MongoDB. Thank you!

Now we've faced with new type of queries: we need to execute the query in MongoDB and get the result. Ordinary select queries using external table is not working for us because PXF is not supported "ORDER BY" and "LIMIT" for external database. We can't do:

SELECT * FROM ext_table_some_collection WHERE field > 10 ORDER BY field LIMIT 10

and get the result of:

db.some_collection.find({},{"field": {$gt: 10}}).sort({"field":1}).limit(10)

We've found "Named Queries" in the documentation.

Could you please give an example of how we could use this in our case described above?

My current try was fail:

  1. I put example_query.sql to pxf/server/mongo_test/. The content was:
db.some_collection.find({},{"field": {$gt: 10}}).sort({"field":1}).limit(10)
  1. After restarting pxf I created the external table:
CREATE EXTERNAL TABLE test ( _id text, field int)
LOCATION ('pxf://query:example_query?PROFILE=Jdbc&SERVER=mongo_test')
FORMAT 'CUSTOM' (FORMATTER='pxfwritable_import');
  1. And the query return error:
select * from test limit 1;
ERROR:  remote component error (500) from '127.0.0.1:5888':  type  Exception report   message   ERROR: No schema defined.  A schema is needed when performing joins or complex SQL.  Put the parameter rebuildschema=true in your JDBC URL at least for the first time connecting to rebuild the schema.  Example: jdbc:mongo:localhost:27017?rebuildschema=true.  The default schema location is in the file mongo_<dbname>.xml.  Use the schema parameter to set a file location (e.g. schema=mongo.xml) to store the schema.  See connection parameters at http://www.unityjdbc.com/mongojdbc/ for more details.    description   The server encountered an internal error that prevented it from fulfilling this request.    exception   java.io.IOException: ERROR: No schema defined.  A schema is needed when performing joins or complex SQL.  Put the parameter rebuildschema=true in your JDBC URL at least for the first time connecting to rebuild the schema.  Example: jdbc:mongo:localhost:27017?rebuildschema=true.  The default schema location is in the file mongo_<dbname>.xml.  Use the schema parameter to set a file location (e.g. schema=mongo.xml) to store the schema.  See connection parameters at http://www.unityjdbc.com/mongojdbc/ for more details. (libchurl.c:920)  (seg0 slice1 172.18.0.2:40000 pid=7258) (libchurl.c:920)
  1. I see in PXF logs the query was read by PXF

Please help:

  1. What query should be in sql file? It should be native query for external database (for mongo: db.col.find()) or it should be sql language queries with collection name and field names (for mongo: select * from col) and it will be converted to mongo query (db.col.find()) by PXF?

  2. Do you have thoughts about what ERROR: No schema defined means in the error above in PXF context?

  3. Could you help to make PXF Named Queries work with MongoDB?

Thanks!

yv5125 avatar Apr 07 '20 13:04 yv5125

Hy @yvaa , thank you for using PXF and we're glad you got it working with MongoDB. Will try to help you with your question here.

  1. Why are you saying you cannot do SELECT * FROM ext_table_some_collection WHERE field > 10 ORDER BY field LIMIT 10 ? In the case of JDBC profile, it will do predicate pushdown, so PXF will retrieve all rows where filed > 10 and give them to the GPDB, where they all will be sorted and the top 10 will be returned. If this does not work, please let us know what specific error you're getting.

  2. Arguably, running this SQL completely within MongoDB might be faster, and then you can use Named Query feature to have PXF submit your pre-defined query to MongoDB and just return 10 resulting records to GPDB. Since you're using the JDBC profile, the text of your named query (contents of example_query.sql file) must be a valid SQL.

You can embed the whole query as a named query putting the following line in example_query.sql:

SELECT id, field FROM collection WHERE field > 10 ORDER BY field LIMIT 10

in this case the following query will be actually sent to MongoDB by the JDBC profile via Mongo JDBC driver:

SELECT id, field FROM (
      SELECT id, field FROM collection WHERE field > 10 ORDER BY field LIMIT 10
) pxfsubquery

In your case, you have to embed WHERE statement in your named query since you also want to do ORDER BY followed by LIMIT. If there were no LIMIT requirement, then you could've left WHERE statement to be applied dynamically. I can elaborate on this separately later.

  1. PXF does not do anything specific to MongoDB, it only works with the JDBC drivers. If you're saying it already works with MongoDB, it means you have your JDBC properties correctly configured in jdbc-site.xml. However, the error you present indicates that JDBC driver had problems either connecting to MongoDB or executing the query. Please change the text of your named query file to the valid SQL illustrated above and let us know whether that solved the problem.

denalex avatar Apr 08 '20 20:04 denalex

Hello @denalex !

Why are you saying you cannot do SELECT * FROM ext_table_some_collection WHERE field > 10 ORDER BY field LIMIT 10 ? In the case of JDBC profile,

  1. it will do predicate pushdown, so PXF will retrieve all rows where filed > 10
  2. and give them to the GPDB, where they all will be sorted
  3. and the top 10 will be returned.

You're right about what PXF do and I see this in MongoDB queries log. But we expect/need different behavior. In short terms it could be described as "do in MongoDB exactly what described by SQL query and get the result". It means that step 2 and 3 (I've added numbers to your description in the quote above) should be executed on MongoDB side.

In the case you described PXF get ALL rows where "field > 10" from MongoDB. If we have millions of rows where "field > 10" PXF will get all this rows to sort it and get top 10. It's inefficient way to solve the task.

Please change the text of your named query file to the valid SQL illustrated above and let us know whether that solved the problem.

I tried the valid SQL query as Named Query. I have the same error: ERROR: No schema defined from jdbc driver. PXF read the query and then driver raises this error without sending the query to the MongoDB server.

If you're saying it already works with MongoDB, it means you have your JDBC properties correctly configured in jdbc-site.xml.

Yes. If we use ordinary LOCATION ('pxf://collection?PROFILE=JDBC&SERVER=mongo_server') it works.

Also we tried to create a collection as view to pack full query we need into it and then use PXF on it. But JDBC driver rise an error:

Namespace collection_view is a view. Legacy find operations are not supported on views. Only clients which support the find command can be used to query views.

So we're in dead end and think about replace PXF to something else. I understand that it is probably not the PXF problem and this is the jdbc driver problem. But in sum it's not working for MongoDB as described in PXF documentation. This thread is the last try and if you can advice something it will be great.

yv5125 avatar Apr 09 '20 08:04 yv5125

Can you please provide the following:

  1. Full content of your jdbc-site.xml file
  2. Name and version of the Mongo JDBC driver you're using
  3. MongoDB version
  4. Script to create / populate a MongoDB collection

We will try to reproduce the issue you're facing.

denalex avatar Apr 10 '20 00:04 denalex

Full content of your jdbc-site.xml file

https://github.com/yvaa/pxf-mongodb/blob/master/jdbc-site.xml

Name and version of the Mongo JDBC driver you're using

https://github.com/yvaa/pxf-mongodb/blob/master/unityjdbc_mongo.jar

MongoDB version Script to create / populate a MongoDB collection

https://github.com/yvaa/pxf-mongodb/blob/master/README.md

We will try to reproduce the issue you're facing.

Many thanks!

yv5125 avatar Apr 10 '20 10:04 yv5125

Hello @denalex!

We will try to reproduce the issue you're facing.

Did you have time to try?

yv5125 avatar Jun 04 '20 09:06 yv5125

Hi @yvaa, I was actually able to reproduce the issue, but we still haven't found the root cause. I will try to spend sometime looking into this issue next week. Apologies for the delay in resolving this issue

frankgh avatar Jun 16 '20 14:06 frankgh