data icon indicating copy to clipboard operation
data copied to clipboard

Adds `TextStreamingDecoder`

Open keunwoochoi opened this issue 5 months ago • 14 comments

This is a specialized file opener + decoder that

  • works for various types of sources (s3, gcs, local path)
  • open any text file and stream the content line by line.

keunwoochoi avatar Jul 22 '25 02:07 keunwoochoi

:link: Helpful Links

:test_tube: See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/data/1500

Note: Links to docs will display an error until the docs builds have been completed.

:x: 4 New Failures

As of commit 8832f44d84c0ec91ea2da34df4e1cb62908b08ae with merge base 92950795e0790eb74df995daf40b658e85fd2c9f (image):

NEW FAILURES - The following jobs have failed:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

pytorch-bot[bot] avatar Aug 15 '25 16:08 pytorch-bot[bot]

thanks for the review. made some changes & pushed three commits.

keunwoochoi avatar Aug 31 '25 21:08 keunwoochoi

(note to myself)


(1018 durations < 0.005s hidden.  Use -vv to show these durations.)
=========================== short test summary info ===========================
FAILED test/nodes/io/test_text_streaming_decoder.py::test_text_stream_metadata - PermissionError: [WinError 32] The process cannot access the file because it is being used by another process: 'C:\\Users\\RUNNER~1\\AppData\\Local\\Temp\\tmpcyw34h51\\test1.txt'
FAILED test/nodes/io/test_text_streaming_decoder.py::test_text_stream_state_management - PermissionError: [WinError 32] The process cannot access the file because it is being used by another process: 'C:\\Users\\RUNNER~1\\AppData\\Local\\Temp\\tmpe4dio47s\\test1.txt'
FAILED test/nodes/io/test_text_streaming_decoder.py::test_text_stream_empty_file - PermissionError: [WinError 32] The process cannot access the file because it is being used by another process: 'C:\\Users\\RUNNER~1\\AppData\\Local\\Temp\\tmpivu3sp7w\\normal.txt'
FAILED test/nodes/io/test_text_streaming_decoder.py::test_text_stream_encoding - PermissionError: [WinError 32] The process cannot access the file because it is being used by another process: 'C:\\Users\\RUNNER~1\\AppData\\Local\\Temp\\tmp73b21cs9\\utf8.txt'
FAILED test/nodes/io/test_text_streaming_decoder.py::test_error_handling - PermissionError: [WinError 32] The process cannot access the file because it is being used by another process: 'C:\\Users\\RUNNER~1\\AppData\\Local\\Temp\\tmp4oq398ov\\valid.txt'
===== 5 failed, 383 passed, 15 skipped, 27 warnings in 1320.71s (0:22:00) =====
Error: Process completed with exit code 1.

keunwoochoi avatar Sep 10 '25 20:09 keunwoochoi

@ramanishsingh giving another shot with explicitly .shutdown()ing nodes.

keunwoochoi avatar Oct 10 '25 01:10 keunwoochoi

hi @ramanishsingh, can we try the CI with the latest change?

also, i will have access to a windows laptop in a few days, in case the latest commit doesn't fix the issue.

keunwoochoi avatar Oct 13 '25 23:10 keunwoochoi

hi @ramanishsingh, can we try the CI with the latest change?

also, i will have access to a windows laptop in a few days, in case the latest commit doesn't fix the issue.

Hi @keunwoochoi , somehow I can't run the CI. Trying to figure out.

ramanishsingh avatar Oct 14 '25 03:10 ramanishsingh

i just tested the latest change with my window machine, and it passes the test! let's see how it goes with the CI.

keunwoochoi avatar Oct 14 '25 19:10 keunwoochoi

a reminder ^ and also a question - is it going to be tested with the CI pipeline for Windows? (that was the only problem it used to have.)

keunwoochoi avatar Oct 18 '25 21:10 keunwoochoi

👀

keunwoochoi avatar Oct 28 '25 00:10 keunwoochoi

Hi @keunwoochoi , this repo has recently moved from pytorch to meta-pytorch and somehow the CIs are not running. We haven't been able to devote cycles for solving this yet.

ramanishsingh avatar Oct 28 '25 02:10 ramanishsingh

oh i see. yes i noticed that, thanks for the explanation.

keunwoochoi avatar Oct 28 '25 02:10 keunwoochoi

hi all! is there any update? 👀

keunwoochoi avatar Nov 10 '25 01:11 keunwoochoi

is the CI fixed? cc' @aelavender that'd be so nice.. i'm still waiting!!

keunwoochoi avatar Nov 20 '25 03:11 keunwoochoi

Thanks for initiating the CI.

Actually, the previous issue of my PR is fixed. The new failure is not relevant to my change.

test/nodes/test_snapshot_store.py::TestQueueSnapshotStore::test_thread_dead_error FAILED [ 99%]

Can anyone have a look on this? Or perhaps simply re-run the test? cc' @aelavender

keunwoochoi avatar Nov 26 '25 02:11 keunwoochoi