parseable icon indicating copy to clipboard operation
parseable copied to clipboard

PR for issue #824

Open parmesant opened this issue 1 year ago • 1 comments

Fixes #824 (Temporarily)

Description

PR suggests a temporary fix for issue #824. The issue might lie in the way an internal dependency is parsing the path for a given parquet file on Windows as per my conversation with @nikhilsinhaparseable.

Windows uses \ as path separator which, if present in a string, appears as an escaped \\. Rust is able to reach the manifest.json by following the path given in .stream.json as ..."manifest_list":[{"manifest_path":"G:\\projects\\local_parseable\\parseable\\data\\test/date=2024-07-06/manifest.json",.... By extension, it should be able to read the parquet files from their path present in manifest.json as ..."files":[{"file_path":"G:\\projects\\local_parseable\\parseable\\data\\test/date=2024-07-06/hour=06/minute=52/DESKTOP-C8E53PR.data.Z6KsJTjB2iryjtr.parquet",... Looking at partitioned_files, it is clear that the issue is not with how Rust is treating the path, but probably with how the physical plan is executed by Datafusion.

Excerpts from the logs-

[2024-07-06T14:33:06Z DEBUG datafusion::physical_planner] Optimized physical plan:
    SortPreservingMergeExec: [timestamp@0 ASC NULLS LAST] ....

                 .... ParquetExec: file_groups={1 group: [[G:%5Cprojects%5Clocal_parseable%5Cparseable%5Cdata%5Ctest/date=2024-07-06/hour=14/minute=32/DESKTOP-C8E53PR.data.IdtF1BXkh0THAzp.parquet]]}, projection=[p_timestamp],....

[2024-07-06T14:33:06Z DEBUG datafusion_physical_plan::aggregates::row_hash] Creating GroupedHashAggregateStream

[2024-07-06T14:33:06Z DEBUG actix_web::middleware::logger] Error in response: Execute(Datafusion(External(ParquetError(General("AsyncChunkReader::get_bytes error: Object at location G:\\%5Cprojects%5Clocal_parseable%5Cparseable%5Cdata%5Ctest\\date=2024-07-06\\hour=14\\minute=32\\DESKTOP-C8E53PR.data.IdtF1BXkh0THAzp.parquet not found: The system cannot find the path specified. (os error 3)")))))

Another strange thing is that, in the error, the path appears to have both the escaped backslash \\ as well as the hex notation of a backslash %5C. Which could indicate that the path is getting corrupted somehow.

For now, a temporary fix in the form of string replacement can be used (since windows works just fine with forward slashes in place of backslashes)

This PR also fixes issues present with the powershell script which is in charge or downloading and setting-up Parseable on a Windows machine. Over time, the nomenclature rules of new releases changed which broke the script. Similar issues exist in the .sh script as well but it relies on some zsh specific commands and thus could not be fixed by me (I would propose changing it to a bash script since macOS supports that also).


This PR has:

  • [x] been tested to ensure log ingestion and log query works.
  • [x] added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
  • [x] added documentation for new or modified features or behaviors.

parmesant avatar Jul 06 '24 14:07 parmesant

CLA Assistant Lite bot:
Thank you for your submission, we really appreciate it. Like many open-source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution. You can sign the CLA by just posting a Pull Request Comment same as the below format.


I have read the CLA Document and I hereby sign the CLA


You can retrigger this bot by commenting recheck in this Pull Request

github-actions[bot] avatar Jul 06 '24 14:07 github-actions[bot]

Closing PR in favour of taking more time to find (and hoepfully fix) the actual issue coming from Datafusion.

parmesant avatar Jul 09 '24 02:07 parmesant