Allow for copying file while in write mode
copy the file while writing as if we finished writing
import jsonstreams
with jsonstreams.Stream(jsonstreams.Type.object, filename='foo') as s:
for i in range(10000):
s.write('foo', 'bar')
if i % 1000 == 0:
with jsonstreams.copy_from_stream(jsonstreams.Type.object, filename='foo') as copied_tmp_file_path:
compress_and_upload_to_s3(copied_tmp_file_path)
compress_and_upload_to_s3(copied_tmp_file_path)
As you can see in the above snippet, I need to checkpoint the opened stream to s3 on fixed intervals,
I suggest to add copy_from_stream, it takes the opened stream, copy it as it's, and close it properly with a } if it is an object or a ] if it's an array.
I'll have to think about this. The complication is how to handle a deeply nested stream, the api is highly stateful, and I'm not sure how to go about it. It might require some copying of state
So there's two things I see that make this really difficult:
- the statefulness of the API, and the fact that a given subobject doesn't know whether it's a subobject, or how deep it is in the stack
- file-like objects are generally not copyable, which means we'd need to seek and read back whatever is in them already before finishing them (which is not always possible, you can't seek
sys.stdout, for example).
I can come up with some ideas for solving 2, you could do something like create a class:
@dataclass
class DualStream:
file: typing.IO[Text]
stream: typing.IO[Text] = io.StringIO()
def write(self, data: str) -> None:
self.file.write(str)
self.shadow.write(str)
def shadow_read(self) -> io.StringIO:
data = self.shadow
self.shadow = io.StringIO()
return data
# You might need more API to make this work
You could pass this in via the fd= argument, and then when you want to upload the data you could class the shadow_read method, which would return and replace the shadowed stream, and you could get that.
I'm still not sure how to solve 1 elegantly though.
You could pass this in via the
fd=argument, and then when you want to upload the data you could class theshadow_readmethod, which would return and replace the shadowed stream, and you could get that.I'm still not sure how to solve 1 elegantly though.
This will create a memory footprint problem by having an io.StringIO(). Because we can't assume beforehand that the file size will fit into memory.
What if we copy the state entirely. And as for the file content, we can copy it using shutil.copyfile and then open it in append mode, and call the close method on it. Does this work or am I missing something?