datachain icon indicating copy to clipboard operation
datachain copied to clipboard

Fix for reading exported parquet

Open ilongin opened this issue 7 months ago • 4 comments

Fixing the issue when we read parquet file that is being created with datachain itself and already has source fields inside it. What was happening is that we were adding duplicated source fields on read and this PR avoids that even if source=True flag is set.

ilongin avatar Apr 30 '25 10:04 ilongin

Deploying datachain-documentation with  Cloudflare Pages  Cloudflare Pages

Latest commit: e560b08
Status: ✅  Deploy successful!
Preview URL: https://e36e01c3.datachain-documentation.pages.dev
Branch Preview URL: https://ilongin-1066-fix-reading-exp.datachain-documentation.pages.dev

View logs

@ilongin tests are broken?

@ilongin should we actually rewrite source if source is True? 🤔 since the existing source is kinda wrong by now ...

shcheklein avatar Apr 30 '25 16:04 shcheklein

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 88.69%. Comparing base (ff97825) to head (e560b08). Report is 1 commits behind head on main.

Additional details and impacted files

Impacted file tree graph

@@           Coverage Diff           @@
##             main    #1071   +/-   ##
=======================================
  Coverage   88.68%   88.69%           
=======================================
  Files         152      152           
  Lines       13606    13609    +3     
  Branches     1893     1894    +1     
=======================================
+ Hits        12067    12070    +3     
  Misses       1093     1093           
  Partials      446      446           
Flag Coverage Δ
datachain 88.62% <100.00%> (+<0.01%) :arrow_up:

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
src/datachain/lib/arrow.py 98.80% <100.00%> (+0.02%) :arrow_up:
:rocket: New features to boost your workflow:
  • :snowflake: Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

codecov[bot] avatar May 05 '25 14:05 codecov[bot]

@ilongin should we actually rewrite source if source is True? 🤔 since the existing source is kinda wrong by now ...

Make sense, I've overwritten it now with new source.

ilongin avatar May 19 '25 09:05 ilongin

@ilongin what is the status here?

shcheklein avatar Jun 07 '25 00:06 shcheklein