marklogic-contentpump
marklogic-contentpump copied to clipboard
support for -custom-uri on import
Please add support for custom uri designation for importing compressed manuals or enable a proper override in the default URI labeling behaviors.
I have followed: https://docs.marklogic.com/guide/mlcp-guide/en/importing-content-into-marklogic-server/controlling-database-uris-during-ingestion/transforming-the-default-uri.html
Various configurations do not work, perhaps I have the incorrect syntax, it is not obvious from the documentation what version of regular expressions is utilized, (PERL, PCRE?).
I have a need to import compressed manuals (all .zip but possibly gzip later), but need to do minor alterations to the URI as it's including the .zip extension by default.
java.lang.IllegalArgumentException: Invalid option argument for output_uri_replace :Boeing 777 Test Manual.zip,TESTPATH43/Boeing_777_Test_Manual
My filename is Boeing 777 Test Manual.zip
and it needs to become /USER_INPUT_ROOT/MANUAL_NAME/<files>
.
I have a python api thats acting as a wrapper for MLCP and it functions entirely without issue except for this behavior.
def import_data(database, root_path, files, marklogic_connection):
for file in files:
#convert spaces in file.filename to underscores
file_name = file.filename.replace(" ", "_")
# remove file extension from the end of string
file_name = file_name.split(".")[0]
# if root_path has a trailing slash, do nothing, else add a trailing slash
root_path = root_path if root_path.endswith("/") else f"{root_path}/"
# if root_path has starting slash, remove it
root_path = root_path[1:] if root_path.startswith("/") else root_path
file_uri = f"{root_path}{file_name}"
print(f"Importing {file.filename} to {file_uri}")
cmd = [
MLCP,
"import",
f"-host {marklogic_connection['host']}",
f"-port {marklogic_connection['port']}",
f"-database {database}",
f"-username {marklogic_connection['username']}",
f"-password {marklogic_connection['password']}",
"-input_compressed true",
"-mode local",
"-base_path /",
"-input_compression_codec zip",
"-ssl false",
f"-input_file_path '/tmp/{file.filename}'",
f"-output_uri_replace '{file.filename},{file_uri}'"
]
invoke_mlcp(cmd)
# remove the file from the /tmp directory
subprocess.run(["rm", f"/tmp/{file.filename}"])
invoke_mlcp()
simply invokes the bash script provided on a subprocess.
Thank you