glommio
glommio copied to clipboard
Implemeantion of write_all/read_exact in DMA/Buffered file
Hi, I am the only beginner in this area, but as far as I understand write/read calls of the file implementations can write/read provided data only partially (I suppose behaviour of async counterparts is the same as sync counterparts) and if the developer wants to be sure that whole data are read/written she needs to write her own implementation. I understand that such functionality can be implemented using futures create and provided streams, but it implies overhead of copying data into DmaBuffer. So probably would be good to provide such methods as part of the API?
Hello.
You might want to take a look at the StreamReader and StreamWriter structures (in the process of being renamed to DmaStreamReader
and DmaStreamWriter
, as we add buffered I/O)
They will read a file fully as a stream.
About partial reads and writes, they do happen, but usually they don't happen on a whim but rather due to some issue. For writes that is usually ENOSPC. For reads you might have hit EOF.
H @glommer ,
You might want to take a look at the StreamReader and StreamWriter structures (in the process of being renamed to DmaStreamReader and DmaStreamWriter, as we add buffered I/O)
That is what I actually meant above when wrote
I understand that such functionality can be implemented using futures create and provided streams, but it implies overhead of copying data into DmaBuffer
But if you think that is not desirable to provide such functionality on file level, you , for sure, can close the issue.
I mean that to speed up data processing I prefer to serialize to the DmaBuffer on the application level, instead of providing of serialized content in &[u8] form which then will be copied into DmaBuffer by a stream.
Could not stop myself to add 5 cents :-) I am a developer of OrientDB storage engine, and I have seen personally such problem on early stages of development of the engine. When we read a relatively small amount of data form the middle of the file. That is why I find handy to have such methods at hand :-)
I am still struggling a bit to understand what exactly do you propose. It's early and I am still going through my coffee. (having cup 2 of 5 right now...) Maybe that's why =)
Is the issue that you'd like to provide us with a buffer that is already pre-serialized, instead of acquiring a DmaBuffer
?
If that is your goal I would indeed not use the StreamWriter
for that. Mind you, the goal of the StreamWriter
is to provide an interface that is familiar to rust users by implementing AsyncWrite
. There are many downsides of using it, one of which is that it will indeed force a copy from a user provided &[u8]
to the DmaBuffer
. But that's the only way things like write
and write_all
can work (as they use AsyncWrite
under the hood).
The interface you propose would have to be built on top of the raw DmaFile
. You can't escape acquiring DmaBuffer
s, because to issue Direct I/O you need specific requirements on your buffers. Right now it's mostly alignment, but I soon intend to use io_uring's registered buffer functionality in which they also must come from a particular part of memory. But as soon as you have a DmaBuffer, you can do your serialization directly into it and then push.
I am not against such an interface. On the contrary: I am of the opinion that the reason a lot of people these days think "memory = fast, storage = slow" is because the storage oriented APIs are horrible and full of copies everywhere.
I am vastly more concerned with reads, which is why, if you look, reads already provide such non standard APIs. But for writes at the moment I kept simple.
All of this to say that I have no short term interest in writing such API, but if one is to be provided and it looks good enough I'd love to have it.
@laa so I went through your request again from the beginning.
So what you want is a reimplementation of those specific APIs -> write_all, read_exact, but without taking &[u8]
?
I am not a fan of that because I think that gets very confusing very fast. Think someone that already knows those APIs struggling to understand why ours is not a match.
But alternative APIs, clearly marked as such, then I am all for it
Do take a look at the get_buffer_aligned
API for the read stream: it seems to me similar to what you envision.
Hi @glommer , Your first guess of what I intended to propose was correct :-). I am sorry for the silence, I was quite busy.
I do want to avoid double copy overhead for both reads and writes. And my idea is to implement write_all/read_exact methods which accept DmaBuffer and the same methods for the BufferedFile for the symmetry.
I wanted to implement such methods internally but then thought that it would be handy to have it on the level of the library. If you still do not mind to add them I will provide the patch with those changes.
P.S. I did notice that you do not support registered buffers and it was my next topic to discuss. Which I delayed till the creation of community forum :-) Thank you for the update.
P.S.2 My congratulations with first releases on crates.io :-)
I don't mind having those methods, but I think I would prefer them to have different names to avoid confusion. However, if you do intend to provide the patches I think it is best to discuss the concrete code anyway.
So do it, and we'll figure out the details!
About uring registered buffers, take a look at uring.rs. There is some code half working there, and placeholders for the ring to provide the buffers. I just ended up deciding to delay them so I could focus on finishing up other parts of the library. We'll get back to them, and hopefully registered files too.
Since you are dealing with files, please take a look at https://github.com/DataDog/scipio/issues/141
It just took me a while to figure this one out.
I suppose in this context would be interesting to implement AsyncBufRead trait. And introduce similar AsyncBufWrite, AsyncBufWriteExt traits. The last ones are discussed to be introduced. The main advantage is that they represent IO tools which manipulate buffers itself and avoid the double copy of data, instead of AyncRead and AsyncWrite which accept external buffers and as result make a double copy of data which are written are read.
That seems like a good starting point.