marklogic-contentpump icon indicating copy to clipboard operation
marklogic-contentpump copied to clipboard

support for -custom-uri on import

Open starhound opened this issue 6 months ago • 5 comments

Please add support for custom uri designation for importing compressed manuals or enable a proper override in the default URI labeling behaviors.

I have followed: https://docs.marklogic.com/guide/mlcp-guide/en/importing-content-into-marklogic-server/controlling-database-uris-during-ingestion/transforming-the-default-uri.html

Various configurations do not work, perhaps I have the incorrect syntax, it is not obvious from the documentation what version of regular expressions is utilized, (PERL, PCRE?).

I have a need to import compressed manuals (all .zip but possibly gzip later), but need to do minor alterations to the URI as it's including the .zip extension by default.

java.lang.IllegalArgumentException: Invalid option argument for output_uri_replace :Boeing 777 Test Manual.zip,TESTPATH43/Boeing_777_Test_Manual

My filename is Boeing 777 Test Manual.zip and it needs to become /USER_INPUT_ROOT/MANUAL_NAME/<files>.

I have a python api thats acting as a wrapper for MLCP and it functions entirely without issue except for this behavior.

def import_data(database, root_path, files, marklogic_connection):
    for file in files:
        #convert spaces in file.filename to underscores
        file_name = file.filename.replace(" ", "_")
        # remove file extension from the end of string
        file_name = file_name.split(".")[0]
        # if root_path has a trailing slash, do nothing, else add a trailing slash
        root_path = root_path if root_path.endswith("/") else f"{root_path}/"
        # if root_path has starting slash, remove it
        root_path = root_path[1:] if root_path.startswith("/") else root_path
        file_uri = f"{root_path}{file_name}"
        print(f"Importing {file.filename} to {file_uri}")
        cmd = [
            MLCP,
            "import",
            f"-host {marklogic_connection['host']}",
            f"-port {marklogic_connection['port']}",
            f"-database {database}",
            f"-username {marklogic_connection['username']}",
            f"-password {marklogic_connection['password']}",
            "-input_compressed true",
            "-mode local",
            "-base_path /",
            "-input_compression_codec zip",
            "-ssl false",
            f"-input_file_path '/tmp/{file.filename}'",
            f"-output_uri_replace '{file.filename},{file_uri}'"
        ]
        invoke_mlcp(cmd)
        # remove the file from the /tmp directory
        subprocess.run(["rm", f"/tmp/{file.filename}"])

invoke_mlcp() simply invokes the bash script provided on a subprocess.

Thank you

starhound avatar Aug 15 '24 16:08 starhound