pxf
pxf copied to clipboard
Named query and MongoDB
Hello! We successfully using PXF with MongoDB. Thank you!
Now we've faced with new type of queries: we need to execute the query in MongoDB and get the result. Ordinary select queries using external table is not working for us because PXF is not supported "ORDER BY" and "LIMIT" for external database. We can't do:
SELECT * FROM ext_table_some_collection WHERE field > 10 ORDER BY field LIMIT 10
and get the result of:
db.some_collection.find({},{"field": {$gt: 10}}).sort({"field":1}).limit(10)
We've found "Named Queries" in the documentation.
Could you please give an example of how we could use this in our case described above?
My current try was fail:
- I put
example_query.sql
topxf/server/mongo_test/
. The content was:
db.some_collection.find({},{"field": {$gt: 10}}).sort({"field":1}).limit(10)
- After restarting pxf I created the external table:
CREATE EXTERNAL TABLE test ( _id text, field int)
LOCATION ('pxf://query:example_query?PROFILE=Jdbc&SERVER=mongo_test')
FORMAT 'CUSTOM' (FORMATTER='pxfwritable_import');
- And the query return error:
select * from test limit 1;
ERROR: remote component error (500) from '127.0.0.1:5888': type Exception report message ERROR: No schema defined. A schema is needed when performing joins or complex SQL. Put the parameter rebuildschema=true in your JDBC URL at least for the first time connecting to rebuild the schema. Example: jdbc:mongo:localhost:27017?rebuildschema=true. The default schema location is in the file mongo_<dbname>.xml. Use the schema parameter to set a file location (e.g. schema=mongo.xml) to store the schema. See connection parameters at http://www.unityjdbc.com/mongojdbc/ for more details. description The server encountered an internal error that prevented it from fulfilling this request. exception java.io.IOException: ERROR: No schema defined. A schema is needed when performing joins or complex SQL. Put the parameter rebuildschema=true in your JDBC URL at least for the first time connecting to rebuild the schema. Example: jdbc:mongo:localhost:27017?rebuildschema=true. The default schema location is in the file mongo_<dbname>.xml. Use the schema parameter to set a file location (e.g. schema=mongo.xml) to store the schema. See connection parameters at http://www.unityjdbc.com/mongojdbc/ for more details. (libchurl.c:920) (seg0 slice1 172.18.0.2:40000 pid=7258) (libchurl.c:920)
- I see in PXF logs the query was read by PXF
Please help:
-
What query should be in sql file? It should be native query for external database (for mongo:
db.col.find()
) or it should be sql language queries with collection name and field names (for mongo:select * from col
) and it will be converted to mongo query (db.col.find()
) by PXF? -
Do you have thoughts about what
ERROR: No schema defined
means in the error above in PXF context? -
Could you help to make PXF Named Queries work with MongoDB?
Thanks!
Hy @yvaa , thank you for using PXF and we're glad you got it working with MongoDB. Will try to help you with your question here.
-
Why are you saying you cannot do
SELECT * FROM ext_table_some_collection WHERE field > 10 ORDER BY field LIMIT 10
? In the case of JDBC profile, it will do predicate pushdown, so PXF will retrieve all rows wherefiled > 10
and give them to the GPDB, where they all will be sorted and the top 10 will be returned. If this does not work, please let us know what specific error you're getting. -
Arguably, running this SQL completely within MongoDB might be faster, and then you can use Named Query feature to have PXF submit your pre-defined query to MongoDB and just return 10 resulting records to GPDB. Since you're using the JDBC profile, the text of your named query (contents of
example_query.sql
file) must be a valid SQL.
You can embed the whole query as a named query putting the following line in example_query.sql
:
SELECT id, field FROM collection WHERE field > 10 ORDER BY field LIMIT 10
in this case the following query will be actually sent to MongoDB by the JDBC profile via Mongo JDBC driver:
SELECT id, field FROM (
SELECT id, field FROM collection WHERE field > 10 ORDER BY field LIMIT 10
) pxfsubquery
In your case, you have to embed WHERE statement in your named query since you also want to do ORDER BY followed by LIMIT. If there were no LIMIT requirement, then you could've left WHERE statement to be applied dynamically. I can elaborate on this separately later.
- PXF does not do anything specific to MongoDB, it only works with the JDBC drivers. If you're saying it already works with MongoDB, it means you have your JDBC properties correctly configured in
jdbc-site.xml
. However, the error you present indicates that JDBC driver had problems either connecting to MongoDB or executing the query. Please change the text of your named query file to the valid SQL illustrated above and let us know whether that solved the problem.
Hello @denalex !
Why are you saying you cannot do SELECT * FROM ext_table_some_collection WHERE field > 10 ORDER BY field LIMIT 10 ? In the case of JDBC profile,
- it will do predicate pushdown, so PXF will retrieve all rows where filed > 10
- and give them to the GPDB, where they all will be sorted
- and the top 10 will be returned.
You're right about what PXF do and I see this in MongoDB queries log. But we expect/need different behavior. In short terms it could be described as "do in MongoDB exactly what described by SQL query and get the result". It means that step 2 and 3 (I've added numbers to your description in the quote above) should be executed on MongoDB side.
In the case you described PXF get ALL rows where "field > 10" from MongoDB. If we have millions of rows where "field > 10" PXF will get all this rows to sort it and get top 10. It's inefficient way to solve the task.
Please change the text of your named query file to the valid SQL illustrated above and let us know whether that solved the problem.
I tried the valid SQL query as Named Query. I have the same error: ERROR: No schema defined
from jdbc driver. PXF read the query and then driver raises this error without sending the query to the MongoDB server.
If you're saying it already works with MongoDB, it means you have your JDBC properties correctly configured in jdbc-site.xml.
Yes. If we use ordinary LOCATION ('pxf://collection?PROFILE=JDBC&SERVER=mongo_server')
it works.
Also we tried to create a collection as view to pack full query we need into it and then use PXF on it. But JDBC driver rise an error:
Namespace collection_view is a view. Legacy find operations are not supported on views. Only clients which support the find command can be used to query views.
So we're in dead end and think about replace PXF to something else. I understand that it is probably not the PXF problem and this is the jdbc driver problem. But in sum it's not working for MongoDB as described in PXF documentation. This thread is the last try and if you can advice something it will be great.
Can you please provide the following:
- Full content of your
jdbc-site.xml
file - Name and version of the Mongo JDBC driver you're using
- MongoDB version
- Script to create / populate a MongoDB collection
We will try to reproduce the issue you're facing.
Full content of your jdbc-site.xml file
https://github.com/yvaa/pxf-mongodb/blob/master/jdbc-site.xml
Name and version of the Mongo JDBC driver you're using
https://github.com/yvaa/pxf-mongodb/blob/master/unityjdbc_mongo.jar
MongoDB version Script to create / populate a MongoDB collection
https://github.com/yvaa/pxf-mongodb/blob/master/README.md
We will try to reproduce the issue you're facing.
Many thanks!
Hello @denalex!
We will try to reproduce the issue you're facing.
Did you have time to try?
Hi @yvaa, I was actually able to reproduce the issue, but we still haven't found the root cause. I will try to spend sometime looking into this issue next week. Apologies for the delay in resolving this issue