dinit icon indicating copy to clipboard operation
dinit copied to clipboard

Replace use of iostream-based classes

Open davmac314 opened this issue 2 years ago • 45 comments

Unfortunately, C++ iostream facilities suck. They are heavyweight due to supporting different character encodings. They make it impossible to get useful error messages in a standardised way (the language spec allows for it, but the implementations apparently don't bother and map all error conditions to the same useless message). std::ios::failure extends std::system_error but doesn't use errno as the error code.

We should replace them (including use of cout/cerr) with a wrapper for POSIX file I/O.

davmac314 avatar Sep 11 '23 12:09 davmac314

Do you mean we need to use printf() and friends in all cases or replacing something like ifstream with pure C functions like open() or even both? for example: https://github.com/davmac314/dinit/blob/a6cff08795ffaea003c1e893c4b3ec96d81a6ae9/src/dinit-env.cc#L20-L26 to this?

// Read and set environment variables from a file. May throw std::bad_alloc, std::system_error.
void read_env_file(const char *env_file_path, bool log_warnings, environment &env)
{
    int env_file = open(env_file_path, O_RDONLY);
    if (env_file == -1) {
        return
    }
    //env_file.exceptions(std::ios::badbit);

mobin-2008 avatar Sep 11 '23 18:09 mobin-2008

No, not necessarily, I meant replace use of iostream (including fstream etc) with a similar wrapper class which uses open and write/read internally. It is still good to use a class rather than use POSIX functions directly as it allows using the RAII idiom.

davmac314 avatar Sep 12 '23 13:09 davmac314

I worked on it a bit and I really enjoy, @davmac314 Please assign this issue to me if you wish. for example I have a very basic implantation for replacing cout and cerr with very better code generation:

#ifndef DINIT_IOSTREAM_H
#define DINIT_IOSTREAM_H

#include <baseproc-sys.h>
#include <cstring>
#include <cmath>

// ToDo: Desc

namespace dio {

class ostream
{
    public:
        ostream operator<<(const char *msg)
        {
            bp_sys::write(1, msg, strlen(msg));
            // ToDo: Error handling
            return *this;
        }
        ostream operator<<(std::string msg)
        {
            bp_sys::write(1, msg.c_str(), msg.size());
            return *this;
        }
        ostream operator<<(int num)
        {
            char tmp[(int)((ceil(log10(num))+1)*sizeof(char))];
            sprintf(tmp, "%d", num);
            write(1, tmp, strlen(tmp));
            return *this;
        }
};

ostream cout;
// ToDo: cerr should print to fd 2 (stderr)
ostream cerr;
}

#endif

and dinitcheck compiled and ran with no issues. We need buffer for stdio?

mobin-2008 avatar Sep 12 '23 19:09 mobin-2008

the shift operator overloads are a bunch of gross 90s cargo cult, so while replacing them i'd suggest actually making up reasonable interfaces

low level file descriptor funcs are too primitive for reasonable i/o though, so the implementation may get relatively complex

q66 avatar Sep 12 '23 20:09 q66

the shift operator overloads are a bunch of gross 90s cargo cult, so while replacing them i'd suggest actually making up reasonable interfaces

With something like fmt?

mobin-2008 avatar Sep 12 '23 21:09 mobin-2008

@mobin-2008 -

Input is the more important issue, and more difficult.

For output, fmt is probably overkill. Look at log (in dinit-log.cc) for something closer to what I was thinking.

Yes, there should be buffering. Particularly for input but ideally for output also.

There are a lot of subtle issues involved in getting this right, so please think carefully, and consider the requirements of the project, existing style and practice, etc. Avoid memory allocation where practical. Throw exceptions appropriately. Provide quality code documentation (comments).

davmac314 avatar Sep 13 '23 10:09 davmac314

Note that your sample implementation already contains things I would outright reject:

  • variable length arrays - these are not valid in standard C++ and are an undesirable feature even in C
  • use of floating-point functions (log10) for something as simple as numerical (integral!) output. Look at existing dinit code.

davmac314 avatar Sep 13 '23 10:09 davmac314

After a while, I'm so happy to announce that iostream and fstream are mostly ready and just needs to be clean-up ed in some sections.

Feel free to take a look at it: https://github.com/mobin-2008/dinit/blob/iostream_replace/src/includes/dinit-iostream.h https://github.com/mobin-2008/dinit/blob/iostream_replace/src/dinit-iostream.cc https://github.com/mobin-2008/dinit/blob/iostream_replace/src/includes/dinit-fstream.h https://github.com/mobin-2008/dinit/blob/iostream_replace/src/dinit-fstream.cc

Best regards

mobin-2008 avatar Nov 08 '23 20:11 mobin-2008

@mobin-2008 I had a quick look. It's an ok start but I do have lots of concerns and comments.

Some stylistic things:

  • style is wrong in a few places, missing lines between functions is common.
  • missing documentation, it says // TODO: Desc, did you forget to push everything?
  • Unnecessary macros. Eg #define throw_on_eof() if (throw_on_eofbit) throw std::runtime_error("End Of File has been reached.") - I don't think this needs to be a macro. Avoid macros unless they are really necessary (which is almost never).
  • GOODBIT, FAILBIT and EOFBIT names seem to be based on std::iostreams but there they are actually (mostly) independent bits so the names make sense. In your implementation they are different values and cannot be set together so it doesn't seem to make sense to call them "bit".
  • defining both FAILBIT and also failbit, and so on for the other "bit" values, and having them be different values, is very awkward and may be confusing.
  • int _write(const char *msg); should not have a leading underscore.
  • some comments in istream eg // Set file descriptor, Designed for dinit-fstream.h usage should explain the function (purpose, how to use), not the intended user.
  • document which exceptions can be thrown (for functions that can throw).
  • comment for getline says // Read and append one line from given istream into given std::string but the function clears the string as first thing so it doesn't append at all. (Also if it did append to the string this would be inconsistent with std::getline behaviour which also erases the string).
  • ostream::exceptions method documentation doesn't say what values can be passed in. This is particularly awkward because there are both FAILBIT and failbit defined (for example). Also the implementation only handles badbit and nothing else. The documentation needs to be very clear.
  • ostream has good() and bad() but no fail(). (Is a "fail" state even possible? Is there another way to check it?)

Design-related:

  • 16kb circular buffer is a member of the class, that means these are pretty heavy-weight classes. That might be ok, but it needs to be clear in documentation. For example it means you probably only want to pass these by reference.
  • related: what happens if you copy an instance of these classes (eg istream)? Should copying be prohibited? (i.e. delete the copy constructor). What about moves?
  • no destructors for istream/ostream means that the file descriptor is left open when they are destructed. Contrast this to ofstream/ifstream. Why the difference in behaviour? I think this is confusing. It at least needs to be clearly documented, if not made more consistent.
  • output buffer isn't flushed when stream is destructed (or closed).
  • ostream::exceptions function only handles badbit?
  • ostream::_write always calls writev and tries to write the buffer contents, which means that output effectively isn't buffered in a lot of cases.

There's probably more but that is at already a lot of things to think about.

It's all very clearly based on the std::iostream design and I think it'd be good to really think about how you see these classes being used and whether you can improve on the std::iostream interface. Remember the whole point is to replace it with something better, not just to re-implement the exact same thing. Consider things like:

  • is throwing exceptions or not based on internal state (i.e. the mask set by exceptions function) really a good idea? Could we have separate throwing-vs-non-throwing functions instead? Or perhaps a single "checking" function which throws based on current state?
  • do we need bad/fail/eof states, are they all necessary, when do they occur? Your current design means that they can't occur simultaneously so does this mean that if I try to read but the stream is already at end-of-file I get "fail" or I get "eof"? (There needs to be clear documentation). What is the right way to check various things?
  • Could/should (some, or all) error conditions be returned as failure code (or even exceptions) from individual functions, rather than being kept as state in the stream object?

Note that these are genuine questions, I'm not saying the answer is "yes" or "no" in any case, but they should be considered. I'm not sure if you've thought about them. If you have, that's fine, but perhaps you should document your rationale as part of the documentation.

Finally: lots of documentation (comments) is needed! Right now so many things are unclear. It shouldn't be necessary to look at the implementation to understand how it should be used.

davmac314 avatar Nov 11 '23 02:11 davmac314

Most of style things are fixed. Also I started working on documention and ostream is now completely documented. bits are real "bits" now and can be set simultaneously.

ostream::_write always calls writev and tries to write the buffer contents, which means that output effectively isn't buffered in a lot of cases.

I disagree. put() (formerly _write()) only write on these situations:

/*
 * All calls to write() will be buffered but not necessarily written to file descriptor except
 * on these situations:
 *
 * 1. Message has \n (newline)
 * 2. "flush()" (or its exception-free variant: flush_nothrow()) be called
 * 3. "flush" object be passed to write() (equals to calling flush())
 * 4. "endl" object be passed to write() (equals to writing \n and calling flush())
 * 5. Object be destructed (which make a call to flush_nothrow())
 * 6. Buffer be full
 */

Which is OK in my opinion.

Could/should (some, or all) error conditions be returned as failure code (or even exceptions) from individual functions, rather than being kept as state in the stream object?

Hmm, I'm not so happy with returning error conditions in function because it makes into mess. I think a class should maintain its status.

@davmac314 Question: It's safe to assume that EINTR, EWOULDBLOCK and EAGAIN will be changed in one point? my flush() tries up to 10 times about those errnos. I afraid to they get stuck and the whole program get stuck on infinite while loop design.

Current ostream class documention (long description):

Details

/* ostream
 * =-=-=-=
 *
 * This class is designed to provide basic functionality around the output. It maintains
 * a cpbuffer (with a fixed size which can be changed in compile-time) and stores file descriptor number
 * in its private space. it supports empty constructor and constructor with file descriptor
 * as parameter:
 *
 *      dio::ostream output_a;    // Simple construction of ostream
 *      dio::ostream output_b(3); // Construction of ostream with file descriptor No.
 *
 * Also setting (or changing) file descriptor after construction is available by setfd()
 * function.
 *
 * copy/move constructors/assignments and destructor
 * -------------------------------------------------
 *
 * ostream copy constructor is prohibited (copy constructor is deleted).
 *
 * ostream move constructor is available. All variable contents except of the buffer
 * (and its contents), will be moved. Also buffer of moved object will be unavailable
 * (by reset() call).
 * Class definiton has full list of what will be moved from given object.
 *
 * ostream copy assignment is same as copy constructor.
 * ostream move assignment is same as move constructor.
 *
 * ostream desctructor will call flush_nothrow() on desctructing.
 *
 * Public member: write()
 * -----------------------
 *
 * ostream provides write() functions, which support many types of data for writing into fd(s).
 * See class definition for full list of types which supported by write() functions.
 * It's possible to combine different types in a single call with "," charactor also:
 *
 *      output_a.write("This is a example message.\n");
 *      output_b.write("2 + 2 equals to: ", 2 + 2, endl);
 *
 * All calls to write() will be buffered but not necessarily written to file descriptor except
 * on these situations:
 *
 * 1. Message has \n (newline)
 * 2. "flush()" (or its exception-free variant: flush_nothrow()) be called
 * 3. "flush" object be passed to write() (equals to calling flush())
 * 4. "endl" object be passed to write() (equals to writing \n and calling flush())
 * 5. Object be destructed (which make a call to flush_nothrow())
 * 6. Buffer be full
 *
 * write() functions guarantee to return number of actually written charactors not number
 * of buffered charactors. Also they can return -1 on failure cases (currently badbit).
 *
 * Public member: flush()
 * ----------------------
 *
 * ostream provides flush() function to write all contents of buffer. Passing flush object
 * to write() functions equals to call flush() on the stream. This function will return
 * number all written charactors or -1 on failure cases (currently badbit)
 *
 * Error handling and Stream states
 * --------------------------------
 *
 * ostream states are: goodbit and badbit.
 * goodbit is true by default and means "Everything seems to be normal".
 * badbit is 0 by default and means there is something wrong from POSIX I/O calls. It's get
 * set to POSIX errno in this case.
 *
 * For checking current stream states these functions are available:
 * rdstate(): Returns diostates::goodbit or diostates::badbit based on current state bits.
 * good(): Returns 1 or 0 based on goodbit is set or not.
 * bad(): Returns 0 or POSIX errno based on badbit value.
 *
 * For setting current stream state(s) these functions are available:
 * clear(): Sets badbit to 0 and goodbit to true.
 * setstate(): Set requested bit to requested value.
 *
 * NOTE: goodbit cannot be set directly to false by hand! goodbit has conflict with other bits
 *       and setting other bits (e.g. badbit) true will make goodbit false.
 *
 * Exception handling
 * ------------------
 *
 * exceptions() can be used to set when should a exception be thrown. For ostream it accepts
 * diostates::badbit. Also exceptions(0) will disable all cases which can throw exceptions.
 *
 * See description for each function to know what exceptions can be thrown by functions.
 *
 * All of exception-throwing functions have exception-free variants which marked by
 * "_nothrow" suffix. Those functions guarantee not to throw any exceptions regardless of
 * the exceptions() setting.
 */

mobin-2008 avatar Nov 17 '23 15:11 mobin-2008

I disagree. put() (formerly _write()) only write on these situations:

Ok, I see I misread it. But:

  1. Message has \n (newline)

Why flush just because of newline? And:

  1. "endl" object be passed to write() (equals to writing \n and calling flush())

Why does it call flush() if writing '\n' already does flush?

Question: It's safe to assume that EINTR, EWOULDBLOCK and EAGAIN will be changed in one point? my flush() tries up to 10 times about those errnos. I afraid to they get stuck and the whole program get stuck on infinite while loop design.

The stream library doesn't have sufficient context to make a decision about how to handle these errors, so it should report them to the application. It should not try the operation again, it's for the application to decide whether that is the right thing to do.

It might have made sense to have a mode which ignores and retries on EINTR, but I wouldn't bother with that now. For EWOULDBLOCK/EAGAIN just repeating the operation is completely wrong, you will end up repeating the same operation (with the same error) in a tight loop and send CPU usage through the roof. These conditions can only be cleared by something external (eg another process on the other end of the pipe) so you can't ever rely on that happening in any timeframe at all.

davmac314 avatar Nov 18 '23 00:11 davmac314

@mobin-2008 I will have a proper look over what you have when I get a chance (see comment above too; I think flush-on-newline probably isn't the right choice).

davmac314 avatar Nov 19 '23 11:11 davmac314

Sounds good, I'm going to clear some codes and documention in 30 minutes.

mobin-2008 avatar Nov 19 '23 11:11 mobin-2008

@davmac314 I worked on it a bit and here's the list for pending things:

  1. istream class documention (global members are documented completely)
  2. fstream library documention (code is small and self explanatory but having documention is good)

I try to complete these things ~in 4 days~ when I get a chance

mobin-2008 avatar Nov 19 '23 19:11 mobin-2008

@mobin-2008

I try to complete these things in 4 days when I get a chance

Ok - please open a PR when you are ready. I had a quick look and the major concerns are gone, but I will still have a lot of comments/questions.

davmac314 avatar Nov 25 '23 11:11 davmac314

@davmac314 General design question: I really can't find any meaningful usage for copy/move constructor/assignment in all clases. Do you know a meaningful usage of those in ostream, istream and so on?

mobin-2008 avatar Nov 25 '23 11:11 mobin-2008

@davmac314 General design question: I really can't find any meaningful usage for copy/move constructor/assignment in all clases. Do you know a meaningful usage of those in ostream, istream and so on?

For std::ifstream/std::ofstream it transfers ownership of the underlying file handle and buffer. For standard streams it is the only way to transfer ownership of a file handle since there is no (standard) way to get the file handle from the stream object. It's probably not used a lot in practice though.

For your current design which has stream classes that are "heavier" in the sense that they include the buffer inside the object (which suggests to me that you really don't want to have multiple stream objects to deal with one underlying file/stream), moving between streams probably makes even less sense (especially because it should be possible to get the underlying file handle anyway).

davmac314 avatar Nov 25 '23 12:11 davmac314

I reworked move constructor/assignment section and it looks like c++ stdlib now:

  1. ostream/istream has protected move c/a.
  2. ofstream/ifstream has public move c/a which pass address of the buffer of old object to streambuf *buf

Hmm, But it is inefficient in memory because all of the new created objects will have a streambuf mainbuf by default which will wasted storage in moved objects: Example usage:

int main() {

    std::string str;
    dio::ifstream i2;

    {
        dio::ifstream i1("file");
        i1.open();
        if (!i1) return -1;

        i1.getline(str, '\n');

        i2 = std::move(i1);
    }

    i2.getline(str, '\n');

    return 0;
}

Debugger trace of i2 object

(lldb) print i2
(dio::ifstream) {
  dio::istream = {
    mainbuf = {
      cpbuffer<16384> = (buf = ""..., cur_idx = 0, length = 0)
    }
    buf = 0x00007fffffff6738
    fd = 3
    goodbit = true
    eofbit = false
    failbit = false
    badbit = 0
    eofreached = true
    throw_on_eofbit = false
    throw_on_failbit = false
    throw_on_badbit = false
  }
  path = 0x00005555555556e4 "file"
  are_we_open = true
}
(lldb) print i2.buf
(dio::streambuf *) 0x00007fffffff6738
(lldb) print *i2.buf
(dio::streambuf) {
  cpbuffer<16384> = (buf = "**********---***********\n*******- _   _ -********\n******-  0   0  -*******\n******-    |    -*******\n******-   ___   -*******\n*******-       -********\n**********---***********\n***********-************\n***********-************\n**********---***********\n*********-*-*-**********\n********-**-**-*********\n*******-***-***-********\n******-****-****-*******\n***********-************\n***********-************\n**********-*-***********\n*********-***-**********\n********-*****-*********\n*******-*******-********\n******-*********-*******\n"..., cur_idx = 50, length = 475)
}

Documention should advise that using std::move due to current design is memory inefficient and assigning object pointer to a istream * or ostream * is preferred.

@davmac314 WDYT?

mobin-2008 avatar Nov 26 '23 20:11 mobin-2008

I am confused by your debugger output, it doesn't seem to match the current code. I guess you didn't push the updated code?

It looks like you have added a streambuf abstraction wrapping the circular buffer, which is new. Also it looks like ifstream has both an internal streambuf (mainbuf) and a pointer to a streambuf (buf). Why does it have both?

davmac314 avatar Nov 26 '23 22:11 davmac314

(The reason std::ifstream can do a move is because it can transfer ownership of the streambuf. If the buffer is inside the stream object itself, there's no way to do that).

davmac314 avatar Nov 26 '23 22:11 davmac314

I pushed a new version and it's mostly about buffer's memory allocation. Now there is streambuf_mgr class and streambuf_allocator object which maintains all of streambuf used in the whole program. I tried to make them into safest status. Here is the description of this new class:

/* streambuf_mgr
 * =-=-=-=-=-=-=
 *
 * streambuf_mgr is buffer manager which provides functions about
 * allocating and deallocating streambuf buffers.
 *
 * All of streambuf_mgr functionaliy and interfaces are exception-free.
 *
 * copy/move constructors/assignments and destructor
 * -------------------------------------------------
 *
 * streambuf_mgr copy constructor is prohibited (copy constructor is deleted).
 * streambuf_mgr copy assignment is same as copy constructor.
 * streambuf_mgr move constructor is prohibited (move constructor is deleted).
 * streambuf_mgr move assignment is same as move constructor.
 *
 * streambuf_mgr destructor will deallocate all of the registered buffers.
 *
 * Public member: allocate_for()
 * -----------------------------
 *
 * allocate_for() is a function to allocate a streambuf and will return that streambuf
 * pointer. allocate_for will register all buffers in a std::list (and probably through
 * heap allocation) but the streambuf itself will be on stack.
 *
 * Public member: deallocate()
 * ---------------------------
 *
 * deallocate() is a function to deallocate an streambuf. This function will earse the allocated buffer
 * from allocated_buffers std::list.
 *
 * Public member: setowner()
 * -------------------------
 *
 * setowner() is a function to set the allocated buffer is for which object. this function will find allocated
 * buffer and change its "allocated_for" value to passed void pointer.
 */

...

// streambuf_allocator is designed to be responsible for all of buffer allocation
// in the whole program and you probably don't want to create another object
// from streambuf_mgr.
extern streambuf_mgr streambuf_allocator;

I think it's most memory efficient solution that we have.

mobin-2008 avatar Nov 29 '23 12:11 mobin-2008

Also ostream/istream can have constructor which has a custom streambuf* argument to bypass allocation from streambuf_base, In this case the user should ensure about availability of used buffer. e.g It's useful for cout, cerr and cin which are global objects in the whole program or on situations when there is chance of memory non-availability for heap allocation.

If it's ok, I will fix ostream description to represent these new cases.

mobin-2008 avatar Nov 29 '23 21:11 mobin-2008

@davmac314 I think the implementation is ready and just some documentation is missing (because I'm waiting to they get stabilized).

mobin-2008 avatar Nov 30 '23 13:11 mobin-2008

Sorry for spam :(

Last major change: diostates items are now more self-explanatory:

/* All stream state bits.
 *
 * "goodbit" means "everything looks ok" and has conflict with
 * other bits (e.g. setting eof as true will make "good" false).
 * "eofbit" means We are on End Of File and there is nothing left in buffer.
 * "buffer_failbit" means tried to use buffer when buffer point was nullptr.
 * "string_failbit" means there is failed operation at std::string related calls.
 * "posixio_failbit" means a POSIX I/O function failed and errno was set.
 *
 * rdstate() function returns stream's current status based on this.
 * setstate() function only accepts these values as its "bits".
 */
enum diostates
{
    goodbit = 0x01,
    eofbit = 0x02,
    buffer_failbit = 0x04,
    string_failbit = 0x08,
    posixio_failbit = 0x10
};

Also read() is public function for just reading and putting into the buffer without anything else.

mobin-2008 avatar Nov 30 '23 21:11 mobin-2008

@mobin-2008 can you explain the streambuf changes to me please.

Now there is streambuf_mgr class and streambuf_allocator object which maintains all of streambuf used in the whole program.

Why is that necessary? What is the benefit compared to the original design?

davmac314 avatar Nov 30 '23 22:11 davmac314

@davmac314

streambuf is just a cpbuffer with fixed size.

streambuf_base is a class which maintains a streambuf pointer and setbuf(), getbuf() functions to interact with that. It's designed to avoid duplication in ostream/istream.

When maintaining streambuf in class itself is expensive and has some problems around the std::move, streambuf_allocator shows itself, streambuf_allocator is designed to create and maintain buffers outside of the class itself.

mobin-2008 avatar Nov 30 '23 22:11 mobin-2008

When maintaining streambuf in class itself is expensive and has some problems around the std::move,

Do we need streams to be movable?

But alright, let's agree that it's probably better to keep the buffer outside the class because it's better to avoid large allocations on the stack. (And as a side benefit it's now feasible to move iostream objects, whether that's actually needed or not.)

However, what is the point of streambuf_mgr?

davmac314 avatar Nov 30 '23 23:11 davmac314

However, what is the point of streambuf_mgr?

streambuf_allocator is an instance of streambuf_mgr.

mobin-2008 avatar Nov 30 '23 23:11 mobin-2008

You are not actually explaining. I can see that streambuf_allocator is an instance of streambuf_mgr. Why are they needed?

davmac314 avatar Nov 30 '23 23:11 davmac314

What I mean is: why can't an iostream just allocate its own buffer? You can do dynamic allocation with new. Does using streambuf_allocator/streambuf_mgr have any advantage over that?

davmac314 avatar Nov 30 '23 23:11 davmac314