libiio icon indicating copy to clipboard operation
libiio copied to clipboard

re-work libiio debug message logging

Open rgetz opened this issue 4 years ago • 21 comments

Error handling

Typically libiio functions return 0 on success or a negative error code on failure. These negative error codes relate to system error code, which can be multiplied by -1 and passed directly to strerror().

Debug message logging

Currently - libiio squirts out random messages on stdout or stderr. This is problematic for many applications.

The goal is to move to the model that libusb uses a newly imagined iio_context_set_debug() which would be used to enable stdout/stderr logging of certain messages, on a context by context basis.

The goal would be to use the existing levels:

  • NoLog : no messages ever printed by the library (default). Applications are therefore free to close stdout/stderr and those descriptors may be reused without worry.
  • Error : error messages are printed to stderr
  • Warning : warning and error messages are printed to stderr
  • Info : Informational messages are printed to stdout, warning and error messages are printed to stderr
  • Debug : Debug and Informational messages are printed to stdout, warning and error messages are printed to stderr

The logged messages are unstructured. There is no one-to-one correspondence between messages being logged and success or failure return codes from libiio functions. There is no format to the messages, and no guarantee about forwards/backwards compatibility, so you should not try to capture or parse them. They are not and will not be localized. These messages are not suitable for being passed to your application user; instead, the application developer should interpret the error codes returned from libiio functions and provide appropriate notification to the user. The messages are simply there to aid you as a programmer/application developer, and if you're confused because you're getting a strange error code from a libiio function, enabling message logging may give you a suitable explanation.

It is planned that libiio can be configured at cmake time to build without any logging functions at all, useful for embedded systems. In this case, iio_context_set_debug() will have no effects.

rgetz avatar Jun 03 '20 17:06 rgetz

will need to handle some things in iiod differently - and most likely take -v -vv and -vvvv on command line.

rgetz avatar Jun 09 '20 00:06 rgetz

Need to handle network and usb errors differently - right now, we always translate things into something that can be passed to strerr, or perror - but this is problematic, since networking errors (in /usr/include/netdb.h)

/* Error values for `getaddrinfo' function.  */
# define EAI_BADFLAGS     -1    /* Invalid value for `ai_flags' field.  */
# define EAI_NONAME       -2    /* NAME or SERVICE is unknown.  */
# define EAI_AGAIN        -3    /* Temporary failure in name resolution.  */
# define EAI_FAIL         -4    /* Non-recoverable failure in name res.  */
# define EAI_FAMILY       -6    /* `ai_family' not supported.  */
# define EAI_SOCKTYPE     -7    /* `ai_socktype' not supported.  */
# define EAI_SERVICE      -8    /* SERVICE not supported for `ai_socktype'.  */
# define EAI_MEMORY       -10   /* Memory allocation failure.  */
# define EAI_SYSTEM       -11   /* System error returned in `errno'.  */
# define EAI_OVERFLOW     -12   /* Argument buffer overflow.  */

are supposed to use their own error functions gai_strerror(), since they collide with errors in /usr/include/asm-generic/errno-base.h

#define EPERM            1      /* Operation not permitted */
#define ENOENT           2      /* No such file or directory */
#define ESRCH            3      /* No such process */
#define EINTR            4      /* Interrupted system call */
#define EIO              5      /* I/O error */
#define ENXIO            6      /* No such device or address */
#define E2BIG            7      /* Argument list too long */
#define ENOEXEC          8      /* Exec format error */

the causes bad ip addresses ip:foo.bar.foo (error -2, NAME or SERVICE is unknown) to print out error strings : No such file or directory using iio_strerr rather than the actual error (from gai_strerror : Name or service not known).

Same is true for libusb It has it's own error string function libusb_strerror which we currently do not use.

Would suggest to "space" specific errors by adding -1000 for networking errors, and -2000 for usb errors, which then iio_strerr() would dispatch properly.

rgetz avatar Jul 05 '20 15:07 rgetz

Can we just add a string to the errors to explicitly define which backend they get generated from?

tfcollins avatar Jul 23 '20 01:07 tfcollins

Would suggest to "space" specific errors by adding -1000 for networking errors, and -2000 for usb errors, which then iio_strerr() would dispatch properly.

this could work neatly i'd also add Travis' idea to print the backend as a prefix to the message; maybe couple the ideas a bit; iio_strerr() to also include the backend prefix obtained from the error space; and also each backend should maybe add it's own prefix to any error message that it prints;

overall, i like the idea; not much else to add from my side;

commodo avatar Jul 23 '20 06:07 commodo

Agreed. Errors are propagated by numbers. Therefore there is no alternative to paging errors by adding offsets. iio_strerr() needs to de-multiplex, and should internally use libusb_strerror() if the number falls into this page.

mhennerich avatar Jul 23 '20 08:07 mhennerich

This would also allow us to have libiio error codes - which dispatch to iio specific strings. "IIO attributes not found", etc - which might improve usability (end user understanding).

Other question is - right now, we are using errno (which is fine - we are a library)

Libusb - is a little different:

int libusb_init(libusb_context **ctx) This function initialises libusb. 
             It must be called at the beginning of the program, before 
             other libusb routines are used. This function returns 0 on 
             success or LIBUSB_ERROR on failure.

libiio is:

struct iio_context * iio_create_context_from_uri(const char *uri);
 * @brief Create a context from a URI description
 * @param uri A URI describing the context location
 * @return On success, a pointer to a iio_context structure
 * @return On failure, NULL is returned and errno is set appropriately

libusb only leaves valid/standard values in errno that can be decoded by strerror() directly. (If we were to page things - this would no longer be true.).

Should we clobber things? - or make a new iio_errno (which would need to be thread safe - so something like:

/* in utilities.c */
__thread int iio_errno;

/* in iio.h */
extern __thread int iio_errno
#define iio_errno(err) (iio_errno = (err))
#define iio_errno (iio_errno)

and basically leave errno untouched from what it is today?

rgetz avatar Jul 23 '20 13:07 rgetz

Should we clobber things?

ummm, this has no easy answer, because there are pitfalls in both directions; if we clobber errno, people using strerror() complain they get unknown error code if we page things [1] if we add iio_errno, some people will check errno and not see an error; so maybe set errno as well if it's not set [2]

i would say that [2] is a little better than [1] going forward;

we could also do an iio_get_last_error() call that returns errno of another error code [that is paged]; internally we could use some iio_set_last_error(<usb, ip, local backend>, errno/ret/error-code) function that stores the error [maybe to errno] and does some logic if needed; the main benefit of this iio_set_last_error() is that we centralize setting errno; if we assign it directly we're not completely sure that errno is sane/consistent; also, with iio_set_last_error() it may be easier if we change our minds about errno or iio_errno

we could maybe start with an internal iio_set_last_error() implementation and see from there; after we populate libiio with iio_set_last_error() and see how many errors are set in USB, Network, Local, Libiio error spaces, we probably get a better feel for what's better next, and whether we should continue doing the error spaces or just unifying errno, or whether iio_errno is a good idea; [ this doesn't mean that i have a strong opinion for/against any of these options]

commodo avatar Jul 24 '20 13:07 commodo

The problem with the iio_get_last_error() is that there is no state in libiio itself. we would need to store that in the actual context, which many times is not available in the libiio leaf function. It's easier/more straightforward just to set a global...

I guess my thought would be to leave errnno values the same (set them as is), and add a new libiio specific thing iio_errno - which end users can eventual migrate to.

rgetz avatar Jul 24 '20 13:07 rgetz

I would vote for paging and iio_strerr(). We leave the regular errno numbers where they are, and only page USB and Network.

mhennerich avatar Jul 24 '20 14:07 mhennerich

Half of the problem is addressed in my PR #681. It makes it possible to specify the log output and verbosity per-context.

As for error handling, I think we should get rid of errno codes completely. Many times I had to resort to using errno codes that didn't really represent the actual error, e.g. the EINVAL joker card, which means that when the API function eventually returns the error, you have no idea what the actual error is.

So we need libiio-specific error codes, with a public iio-error.h header that can be used by applications should they need to handle particular errors. The library should never return libusb/network codes directly, only error codes that are listed in this iio-error.h file. Then iio_strerror() can be updated to handle these error codes.

As for the errno variable itself, I would like to propose a different solution. Since we only really need it for the few functions that return a pointer, we can use the Linux kernel (and Rust) approach to return pointer-encoded errors. Error state can be checked with a IS_ERR() macro, and the error value can be obtained using the PTR_TO_ERR() macro. This is by the way already being used within libiio.

This obviously being a ABI-breaking change, it wouldn't be for libiio 0.22, but for the next libiio 2.

pcercuei avatar May 13 '21 10:05 pcercuei

As for error handling, I think we should get rid of errno codes completely...

I guess we could always extend it... But anyways, if libiio was to only work with linux, I would most likely not agree with not using errno and typical error codes. Like it or not, it's here for a lot of time, it's known (and familiar) and it's what the OS will give you from an ABI point of view... But since we do want to be as platform independent as possible, I do like the idea of our iio-error layer that keeps things consistent for users.

using the PTR_TO_ERR() macro

very nit comment: I would rather keep the more familiar name :smile:

nunojsa avatar May 14 '21 07:05 nunojsa

@nunojsa how do you differenciate between a parser error ("unknown command") and e.g. an attribute write ("unknown attribute") using standard errno? On both cases you get -EINVAL which is translated to a bland "invalid argument".

As for PTR_TO_ERR: I volontarily changed the macro name because I always get confused between PTR_ERR and ERR_PTR when writing kernel code :(

pcercuei avatar May 14 '21 08:05 pcercuei

how do you differenciate between a parser error ("unknown command") and e.g. an attribute write ("unknown attribute") using standard errno?

I do agree that in some cases there's no real distinction. That is why I mentioned extending it... But both of these situation would fall in the EINVAL, because in the end of the day, we get an error because someone called the API with invalid/unknown arguments... It's just that we cannot differentiate in which argument we failed. Arguably, this is something that also would have to be fixed at "devolpement/evaluation" time (from the app perspective), hence if we do provide enough verbosity in error paths, the app could easily check which argument is invalid and fix it.

But I do not really know all the usecases that we might have with libiio/iiod and for a platform independent library I agree with you that an libiio-error layer sounds good. We will just have to make sure that we translate things accordingly and in some cases I guess we will just pass-trough normal linux error codes since that is what we will get from the kernel anyways (and some fit nicely)...

get confused between PTR_ERR and ERR_PTR when writing kernel code

And I think this is only an "issue" for people that are used to do some kernel code...

nunojsa avatar May 14 '21 08:05 nunojsa

how do you differenciate between a parser error ("unknown command") and e.g. an attribute write ("unknown attribute") using standard errno?

I do agree that in some cases there's no real distinction. That is why I mentioned extending it... But both of these situation would fall in the EINVAL, because in the end of the day, we get an error because someone called the API with invalid/unknown arguments... It's just that we cannot differentiate in which argument we failed. Arguably, this is something that also would have to be fixed at "devolpement/evaluation" time (from the app perspective), hence if we do provide enough verbosity in error paths, the app could easily check which argument is invalid and fix it.

Ok, let's take another example: ETIMEDOUT. You're running over the network, your iio_device_get_buffer() returned ETIMEDOUT. Can you tell me what happened?

But I do not really know all the usecases that we might have with libiio/iiod and for a platform independent library I agree with you that an libiio-error layer sounds good. We will just have to make sure that we translate things accordingly and in some cases I guess we will just pass-trough normal linux error codes since that is what we will get from the kernel anyways (and some fit nicely)...

get confused between PTR_ERR and ERR_PTR when writing kernel code

And I think this is only an "issue" for people that are used to do some kernel code...

Then everybody else won't mind having a PTR_TO_ERR macro...

pcercuei avatar May 14 '21 15:05 pcercuei

Ok, let's take another example: ETIMEDOUT. You're running over the network, your iio_device_get_buffer() returned ETIMEDOUT. Can you tell me what happened?

Well, I would just assume that I could not get my buffer in the timeframe I defined to get it... The actual reason for the timeout can be a lot of different ones that can be even dependent on the network card driver we are running on. For example, in this case I would assume we have a timed poll() before we do the actual recv(). If poll() timeouts, I think ETIMEDOUT is not such an horrible error code. It kind of states what happened... How could we actually know the reason for the timeout? For sure errno error codes won't fit all usecases [that's not possible] and probably not in some specifics cases of libiio but when dealing with the kernel, it is what we get...

Again, I don't really have the insights on libiio to argue much here :).

nunojsa avatar May 14 '21 21:05 nunojsa

ETIMEDOUT could mean that your network connection was (maybe temporarily) down, but that's assuming it was returned by the network backend. It could very well be returned by the local network, or by the kernel, on the remote end. With just one errno value for both cases there's absolutely no way to know where is the problem, and therefore it's useless.

pcercuei avatar May 14 '21 22:05 pcercuei

I do not think ETIMEDOUT will be returned by the kernel both for poll or recv. As for the connection issue, the only thing I guess we can do is to detect connection closure by getting end of file in recv. Then, we have other codes to return other than ETIMEDOUT. I do agree that there's no way to distinguish between local vs remote issues (like local vs remote poll timeout), so encoding something in iiod might add some value. I'm just not sure how significant is the added value. Well, for sure you have something in mind so I might change my mind when I see some code...

nunojsa avatar May 15 '21 09:05 nunojsa

I wrote "by the kernel, on the remote end" though. It can be returned by the IIO drivers.

pcercuei avatar May 15 '21 10:05 pcercuei

Ups, I wrongly closed this... Definitely it can be returned by the IIO driver (on accessing attributes files) and in that case I guess we could do some distinction regarding a poll timeout for example.

nunojsa avatar May 15 '21 11:05 nunojsa

libiio specific error codes - make sense, but passing up the specific libusb errors also do - don't you think?

rgetz avatar Jun 04 '21 18:06 rgetz

I will close this - since I think we have an implementation in the dev branch

rgetz avatar Mar 03 '22 19:03 rgetz