lv_binding_micropython icon indicating copy to clipboard operation
lv_binding_micropython copied to clipboard

Formulate a concise list of Coding Convention requirements for a supported API

Open amirgon opened this issue 3 years ago • 17 comments

Related: https://github.com/lvgl/lvgl/issues/3535#issuecomment-1232489218

A short and easy to follow list about MP conventions.
This will be communicated as LVGL coding standard. Less is better so we should have as few rules as possible

amirgon avatar Sep 01 '22 20:09 amirgon

We can add the points to check in the PR template: image

kisvegabor avatar Sep 02 '22 07:09 kisvegabor

Here are the things that this list should cover, ordered by importance.
Importance is based on how frequent these are missed, and how severe the problem is when they are missed.

  • [ ] Define gc roots where needed
  • [ ] Follow callback conventions
  • [ ] Follow function naming
  • [ ] Use enums instead of macros
  • [ ] How to define strings and arrays
  • [ ] Use typed pointers instead of void pointers

Related: https://github.com/lvgl/lvgl/issues/1763

amirgon avatar Oct 01 '22 20:10 amirgon

Define gc roots where needed

In my experience people usually don't know what a GC root is. We should explain the gist briefly. E.g. Do notmalloc into a static or global variables. Instead declare the variable in LV_ITERATE_ROOTS list in lv_gc.h and mark the variable with GC_ROOT(variable) when it's used.

Follow callback conventions

How to summarize it in 1-2 sentences?

Follow function naming

How to summarize it in 1-2 sentences?

Use enums instead of macros

Prefer enums instead of macros. If inevitable to use defines export them with LV_EXPORT_CONST_INT(defined_value) right after the define.

How to define strings and arrays

In function arguments prefer some_type name[] declaration for array parameters instead of some_type * name

Use typed pointers instead of void pointers

Prefer typed pointers instead of void *.

kisvegabor avatar Oct 02 '22 16:10 kisvegabor

Hmm, is it feasible to write a detector for missing GC roots (only for test purposes, not for regular applications)? It seems that the general idea would be to keep a list of all malloc allocations and verify that they are reachable from at least one of the GC roots. We have been bitten by that specific convention enough times in my opinion. :laughing:

embeddedt avatar Oct 02 '22 19:10 embeddedt

Any idea how to test it? :smile:

kisvegabor avatar Oct 02 '22 20:10 kisvegabor

The above was just a list. See my suggestion below for "Define gc roots where needed" bullet, comments are welcome!

is it feasible to write a detector for missing GC roots

It would be best to detect that with Static Analysis, where we don't run any code so we don't need test coverage.

What you suggest is Dynamic Analysis which requires running the code.
To run the code we would need an example. If we already have an example that runs the code (and also a Micropython example!) then we'll hit the problem anyway, either with Dynamic Analysis or without it.

Indeed it could be nicer to get a verbose error message about a missing gc-root instead of a segmentation fault, but we'll need to see if it's worth the effort.

btw, allocations don't have to be reachable only from gc roots - they can also be reachable from the stack, so stack would need to be scanned too. And these allocation chains are not limited to lv_malloc, and would usually go through Micropython allocated objects as well.

A different idea (possibly simpler?) is to find out what parts of the data segment is used for LVGL pointers (can probably be done by querying ELF debug symbols), then scan these pointers to see if any of them is pointing to Micropython heap (the area that is garbage-collected). Ideally this should be tested right before Micropython garbage collection, but we can also test this sooner.


Define gc Roots Where Needed

Background

When LVGL runs in Micropython, all dynamic memory allocations (lv_malloc) are handled by Micropython's memory manager which is garbage-collected (GC).
To prevent GC from collecting memory prematurely, all dynamic allocated RAM must be reachable by GC.
GC is aware of most allocations, except from pointers on the Data Segment:

  • Pointers which are global variables
  • Pointers which are static global variables
  • Pointers which are static local variables

Such pointers need to be defined in a special way to make them reachable by GC

Identify The Problem

Problem happens when either global, static global or static local pointer variable is allocated dynamically (with lv_malloc)

Solve The Problem

  • Replace the global/static local var with LV_GC_ROOT(_var)
  • Include lv_gc.h on files that use LV_GC_ROOT
  • Add _var to LV_ITERATE_ROOTS on lv_gc.h

Example

https://github.com/lvgl/lvgl/commit/adced46eccfa0437f84aa51aedca4895cc3c679c

amirgon avatar Oct 02 '22 21:10 amirgon

Can we use the MP binding generator to somehow detect functions/patterns not following the convention? E.g. searching for specific text in lv_mpy.c like "error" if the the binding generation failed.

kisvegabor avatar Oct 02 '22 21:10 kisvegabor

Can we use the MP binding generator to somehow detect functions/patterns not following the convention? E.g. searching for specific text in lv_mpy.c like "error" if the the binding generation failed.

In a limited way - we can probably detect some cases.
Today when the binding script doesn't find user_data in a callback - it doesn't fail. It simply doesn't generate the binding for that callback. I could easily change the script to fail with an error in such cases.

For example, you can find this in the generated bindings ports/unix/build-dev/lvgl/lv_mpy.c:

/*
 * Function NOT generated:
 * Missing 'user_data' as a field of the first parameter of the callback function 'lv_anim_t_exec_cb_callback'
 * lv_anim_exec_xcb_t exec_cb
 */

However, if user_data appears as an argument but is not passed correctly from the callback registration function to the callbacks themselves - we will not detect that problem.

It's hard to detect other things.
The script can't know when it makes sense to use void * instead of Type * for example. The script can't guess if a function should be a member of a class or a struct when it's not named correctly, perhaps it is just a global function under lvgl module. etc.

amirgon avatar Oct 02 '22 21:10 amirgon

I think showing at least some potential user_data issues would be an important step. But I wonder how can we filter out the the known "Function NOT generated" reports.

kisvegabor avatar Oct 02 '22 21:10 kisvegabor

I think showing at least some potential user_data issues would be an important step. But I wonder how can we filter out the the known "Function NOT generated" reports.

I agree, but let's first formulate the Coding Convention list, and deal with this later.

I've added above an example for "Define gc Roots Where Needed". Could you comment on this?

  • How is the detail level? Is it concise enough?
  • I've split it to "Background" to explain the issue in general, a section for identifying the problem, a section for solving the problem and a link to an example. Do you think this split is good?
  • This should probably be part of the docs. Should it also go into the LVGL PR template? Or maybe only as a link?

amirgon avatar Oct 07 '22 20:10 amirgon

@amirgon I think the example is great!

If we formulate a list of conventions in the docs then I think we just need to add a link in the current checklist.

embeddedt avatar Oct 07 '22 23:10 embeddedt

For GC roots I can imagine it like:

Do not malloc into a static or global variables. Instead declare the variable in LV_ITERATE_ROOTS list in lv_gc.h and mark the variable with GC_ROOT(variable) when it's used. Lear more [here](link to docs with the longer explanation).

I think GC will be good this way.

The 2 other important points are callback and naming conventions. I've grouped the related notes from https://github.com/lvgl/lvgl/issues/1763 . Is there anything else? If the list is ready we can simplify and shorten them.

  • Follow callback conventions

    • There's a struct that contains a field called void * user_data.
    • A pointer to that struct is provided as the first argument of a callback registration function.
    • A pointer to that struct is provided as the first argument of the callback itself. OR
    • A parameter called void * user_data is provided to the registration function as the last argument.
    • The callback itself recieves void * user_data as the last argument
    • Callbacks that are not available for the Binding must include the string xcb
  • Follow function naming

    • Widget member function must start with lv_<widgetName>
    • Widget member function must receive lv_obj_t * as its first argument, and this argument must be a pointer to the Widget objects itself.
    • Widget constructor must be named lv_<widgetName>_create and receive lv_obj_t * par as its first argument. It can have additional arguments.
    • Member functions of structs should follow the Widget conventions. They should receive a pointer to the struct as the first argument, and the prefix of the struct name should be used as the prefix of the function name (for example lv_indev_drv_register(lv_indev_drv_t * driver) )
    • Avoid ambiguity in between struct members and struct member functions. (For example start in lv_anim_t)
    • Functions which are not part of the public API must begin with underscore, to mark them as "private".
    • Some structs should not be used directly by the user. Instead, LVGL provides many setter/getter functions. For example lv_anim_t has lv_anim_set_path to set the path instead of directly writing to path field of lv_anim_t. In order to minimize the API size, it's better to mark such structs as "private" by prefixing them by underscore. This way the binding script would know not to wrap every field in them, and treat them as opaque objects instead.
    • Argument must be named also on H files

kisvegabor avatar Oct 09 '22 18:10 kisvegabor

For GC roots I can imagine it like:

Do not malloc into a static or global variables. Instead declare the variable in LV_ITERATE_ROOTS list in lv_gc.h and mark the variable with GC_ROOT(variable) when it's used. Lear more [here](link to docs with the longer explanation).

I think GC will be good this way.

Sounds good.

The 2 other important points are callback and naming conventions. I've grouped the related notes from lvgl/lvgl#1763 . Is there anything else? If the list is ready we can simplify and shorten them.

The list covers it I think.
But something when wrong with the copy-paste from lvgl/lvgl#1763, for example lv__create should have been lv_<widgetName>_create, etc.

amirgon avatar Oct 09 '22 22:10 amirgon

I've tried to summarize the function name and callback conventions. What do you think?

  • Widget constructor must follow the lv_<widget_name>_create(lv_obj_t * parent) pattern.
  • Widget member function must start with lv_<widget_name> and should receive lv_obj_t * as first argument which is a pointer to widget object itself.
  • structs should be used via an API and not modified directly via their elements.
  • struct APIs should follow the widgets' conventions. That is to receive a pointer to the struct as the first argument, and the prefix of the struct name should be used as the prefix of the function name too (e.g. lv_disp_set_default(lv_disp_t * disp))
  • Functions and structs which are not part of the public API must begin with underscore in order to mark them as "private".
  • When registering callbacks in a struct, that struct should contain a void * user_data field. A pointer to the struct should be the first argument of the callback registration function. The callback should receive either a pointer to the struct as its first argument OR the same user_data which was used at registration as its last argument. Callbacks not following these conventions should contain the string xcb.
  • Argument must be named in H files too.

kisvegabor avatar Oct 11 '22 18:10 kisvegabor

  • When registering callbacks in a struct, that struct should contain a void * user_data field. A pointer to the struct should be the first argument of the callback registration function. The callback should receive either a pointer to the struct as its first argument OR the same user_data which was used at registration as its last argument. Callbacks not following these conventions should contain the string xcb.

I think that callbacks, like gc, deserve some background and examples.
Most of the issues we had with Micropython bindings compatibility were related to callbacks. People just don't get the point of user_data and even when they add it they don't understand why it's needed and often use it the wrong way.
I also think it's better to clearly show the two options for callbacks, one with user_data as struct member and the other with user_data as function argument, and provide an example for each.
It's also important to point out that the same user_data that is provided in the callback registration must be passed over to the callback when it's called.

  • structs should be used via an API and not modified directly via their elements.

This is not accurate. The bindings knows how to handle struct members and allows the user to read and write them.

The goal here was to minimize the API and reduce the binding code size, by making structs private and exposing only their public members through getter/setter API functions.
But if a struct is fully public and the user is supposed to access all its members anyway, there is no point in adding an API to access it. In fact, this would be wasteful and make the API bigger for no reason.

amirgon avatar Oct 12 '22 21:10 amirgon

I think that callbacks, like gc, deserve some background and examples.

We can add link to more details but it's important have clear rules that we can easily check without any boilerplate.

I also think it's better to clearly show the two options for callbacks, one with user_data as struct member and the other with user_data as function argument, and provide an example for each. It's also important to point out that the same user_data that is provided in the callback registration must be passed over to the callback when it's called.

It's mentioned, but do you think it's not clear enough? "The callback should receive either a pointer to the struct as its first argument OR the same user_data which was used at registration as its last argument."

structs should be used via an API and not modified directly via their elements.

This is not accurate. The bindings knows how to handle struct members and allows the user to read and write them.

It's a general LVGL rule too. The only place where we have used direct struct access was the drivers but I've already changed it in arch/drivers. I've also hidden the huge lv_disp_t and lv_indev_t types in lv_disp_priv.h and lv_indev_priv.h.

kisvegabor avatar Oct 13 '22 11:10 kisvegabor

It's mentioned, but do you think it's not clear enough? "The callback should receive either a pointer to the struct as its first argument OR the same user_data which was used at registration as its last argument."

It's a long and a bit convoluted sentence... I think that two separate "bullets" that present each option might be clearer.

It's a general LVGL rule too. The only place where we have used direct struct access was the drivers but I've already changed it in arch/drivers. I've also hidden the huge lv_disp_t and lv_indev_t types in lv_disp_priv.h and lv_indev_priv.h.

Private structs and headers are excellent! They will reduce the binding code, which is already huge.

To make it work with Micropython bindings, just need to make sure that:

  • These private headers are never included from public headers. Only from C files or other private headers
  • A private struct pointer is never passed as void * but as Struct *, where Struct is declared on a public header as forward declaration typedef struct Struct Struct;

amirgon avatar Oct 13 '22 19:10 amirgon

It's a long and a bit convoluted sentence... I think that two separate "bullets" that present each option might be clearer.

What about this?

  • When callbacks can be registered for a struct that struct must have a void * user_data field.
  • In callback registration functions
    • the first argument should be a pointer to the struct for which the callback is registered
    • and the last parameter should be void * user_data.
  • Callbacks should receive either
    • a pointer to the struct as its first argument
    • or the same user_data which was used at registration as the last argument.
  • Callbacks not following these conventions should contain the string xcb.

But it seems it's not accurate as lv_obj event callback are registered in lv_obj_t but the callbacks receive lv_event_t * e as their only argument. So it's neither a "a pointer to the struct as its first argument " nor "the same user_data which was used at registration as the last argument" . Although the same user_data is passed to lv_event_t which was used during registration. So the callback convention part maybe should be:

  • Callbacks should receive the same user_data which was used at registration either
    • in a struct as the callbacks first argument
    • or as the callback's last argument.

If it's correct "When callbacks can be registered for a struct that struct must have a void * user_data field." is not mandatory as the user_data can be stored anywhere anyhow. The key is that the user_data used during registration should be passed to the callback in struct or directly as an argument.

To make it work with Micropython bindings, just need to make sure that: [...]

Yep, it works like this.

kisvegabor avatar Oct 15 '22 18:10 kisvegabor

  • When callbacks can be registered for a struct that struct must have a void * user_data field.

I'm not sure it's clear what "callbacks can be registered for a struct" means.
Because callbacks don't necessarily relate to structs. Even when they do, the relation can differ from case to case.

  • In callback registration functions

    • the first argument should be a pointer to the struct for which the callback is registered
    • and the last parameter should be void * user_data.

This is not true. You don't need both first argument to be a pointer to the struct and the last parameter to be void * user_data.

You either need to pass the struct (which contains user_data), or pass user_data as last argument, regardless of struct.

But it seems it's not accurate as lv_obj event callback are registered in lv_obj_t but the callbacks receive lv_event_t * e as their only argument.

When events are registered in lv_obj_add_event_cb, the last argument is user_data.
When they are called, the same user_data is passed in lv_event_t.
This is possible only because user_data is copied correctly. This has special handling in LVGL on event_send_core, without that - it wouldn't work.

Callbacks should receive the same user_data which was used at registration either

  • in a struct as the callbacks first argument
  • or as the callback's last argument.

Correct but not complete, it doesn't say how user_data is passed during registration.

I think that to make it accessible and simple for the users, we can simply present the two options:

  • A pointer to a struct is the first argument of both registration and callback. That struct must contain void * user_data field.

Or

  • void * user_data is provided as the last argument of both registration and callback.

It's true that we can mix these two options, as long as user_data is passed from registration to callback, but maybe we can omit that from the short-version of the rules for the sake of simplicity. On the longer explanation we can give the lv_event_t example for mixing the options.

Some short and simple examples could be very useful to clarify everything, I think.

The key is that the user_data used during registration should be passed to the callback in struct or directly as an argument.

Yes, I agree that this is the key. It's good to mention this, but it's not enough.
We must also explain how a "registration" should look like and how a "callback" should look like.
Because the binding script expects a specific "template" of registration and callback prototypes, where arguments must be at a certain position (struct must come first, user_data last), and user_data must be named exactly so, etc.

To make it work with Micropython bindings, just need to make sure that: [...]

Yep, it works like this.

Great!! :+1:

amirgon avatar Oct 16 '22 22:10 amirgon

Okay, so it seems it can be quite simple in the end. Together with the other points and conventions:

Be sure the following conventions are followed:

  • [ ] Follow the Styling guide
  • [ ] Prefer enums instead of macros. If inevitable to use defines export them with LV_EXPORT_CONST_INT(defined_value) right after the define.
  • [ ] In function arguments prefer type name[] declaration for array parameters instead of type * name
  • [ ] Use typed pointers instead of void * pointers
  • [ ] Do not malloc into a static or global variables. Instead declare the variable in LV_ITERATE_ROOTS list in lv_gc.h and mark the variable with GC_ROOT(variable) when it's used.
  • [ ] Widget constructor must follow the lv_<widget_name>_create(lv_obj_t * parent) pattern.
  • [ ] Widget members function must start with lv_<modul_name> and should receive lv_obj_t * as first argument which is a pointer to widget object itself.
  • [ ] structs should be used via an API and not modified directly via their elements.
  • [ ] struct APIs should follow the widgets' conventions. That is to receive a pointer to the struct as the first argument, and the prefix of the struct name should be used as the prefix of the function name too (e.g. lv_disp_set_default(lv_disp_t * disp))
  • [ ] Functions and structs which are not part of the public API must begin with underscore in order to mark them as "private".
  • [ ] Argument must be named in H files too.
  • [ ] To register and use callbacks one of the followings needs to be followed
    • Pass a pointer to a struct as the first argument of both the registration function and the callback. That struct must contain void * user_data field.
    • The last argument of the registration function must be void * user_data and the same user_data needs to be passed as the last argument of the callback.
    • Callbacks not following these conventions should contain the string xcb. Question: Where? In their type or in their name?

Regarding the examples: We have discussed that it would be great if contributors added C and MP examples too. However imagine that if LVGL had more bindings, e.g. C++, MP, Lua, JS, Rust, etc. We can't expect to add examples for all these. Instead we periodically (monthly?) open sponsored issues to updated the examples. We can have script which creates a list of missing examples. If the conventions are followed there can't be a big issue. What do you think?

kisvegabor avatar Oct 19 '22 06:10 kisvegabor

  • Question: Where? In their type or in their name?

In the callback typedef. For example:

typedef void (*lv_anim_exec_xcb_t)(void *, int32_t);

Worth mentioning that xcb callbacks would not be available in other languages. (at least not in Micropython, but probably also not on other auto-generated bindings that need to keep track of callback context without libffi)

However imagine that if LVGL had more bindings, e.g. C++, MP, Lua, JS, Rust, etc. We can't expect to add examples for all these. Instead we periodically (monthly?) open sponsored issues to updated the examples.

In reality, the only active binding we have today is Micropython, right?
Until we have bindings to multiple languages, how about asking/suggesting the contributor to provide Micropython example along with the C example? An example code is a very good way to verify that the conventions are really followed correctly.

amirgon avatar Oct 21 '22 21:10 amirgon

In the callback typedef. For example:

What if there is no dedicated type? E.g. void (*my_cb)(void)

In reality, the only active binding we have today is Micropython, right?

The whole picture looks like this:

  • PikaScript Its API is MP compatible.
  • Berry for Tasmota: No dedicated examples provided.
  • LVGLJS on it's way, I hope some example will be added.
  • Cpp It's stale now but as you pointed out LVGLJS is CPP in its hearth so the Cpp binding might get some traction. If it will it should also have examples.

So strictly speaking really MP is the only binding which has its own examples based on the C examples. However I hope we will have some more active binding soon. A Lua and a Rust binding would be great as well.

Until we have bindings to multiple languages, how about asking/suggesting the contributor to provide Micropython example along with the C example?

Okay, I'll promote adding MP examples until we won't have other bindings.

kisvegabor avatar Oct 23 '22 09:10 kisvegabor

What if there is no dedicated type? E.g. void (*my_cb)(void)

I think it could be a good idea to always require a typedef, for clarity.
But we could probably support void (*my_xcb)(void) as well without typedef, if needed.

Okay, I'll promote adding MP examples until we won't have other bindings.

Great!! 👍

amirgon avatar Oct 23 '22 10:10 amirgon

I think it could be a good idea to always require a typedef, for clarity.

Okay :slightly_smiling_face:

Please confirm if the list is good now. If so I'll update the PR template. (I added some missing points from the beginning of the conversation)

After that if you have interest and time the MicroPython docs could be extended with some longer descriptions about the conventions.

kisvegabor avatar Oct 24 '22 14:10 kisvegabor

Please confirm if the list is good now. If so I'll update the PR template. (I added some missing points from the beginning of the conversation)

Yes, the list is good :+1:
Please add it to the PR template.

After that if you have interest and time the MicroPython docs could be extended with some longer descriptions about the conventions.

Where should they go in the docs? Could you add a placeholder for it?

I would like to explain the most complicated topics in more details:

I can try to summarize these topics and add to the docs.
I think it would be useful to add links from the PR template to these topics on the docs.

amirgon avatar Oct 25 '22 20:10 amirgon

Please add it to the PR template.

Done: https://github.com/lvgl/lvgl/blob/master/.github/pull_request_template.md Looks a little bit long, but at least the content is clear and simple now. Let's see how it work in the practice and if needed we can restructure it to look friendlier.

Where should they go in the docs? Could you add a placeholder for it?

Just add a new section after "How can I use it?" like "Code conventions for the MicroPython binding"

I would like to explain the most complicated topics in more details:

:+1:

I think it would be useful to add links from the PR template to these topics on the docs.

Sure, once when it's ready I'll add the links.

kisvegabor avatar Oct 26 '22 09:10 kisvegabor

Sure, once when it's ready I'll add the links.

@kisvegabor did you add them?
I've looked at pull_request_template.md and can't find them. Maybe I should look someplace else?

amirgon avatar Nov 04 '22 21:11 amirgon

Thanks for the ping. I've just added it: https://github.com/lvgl/lvgl/commit/8dc7fc918a000826ad8a1be7d93da4a7bd60c4ba

kisvegabor avatar Nov 05 '22 17:11 kisvegabor

Thanks for the ping. I've just added it: lvgl/lvgl@8dc7fc9

Great! thanks.

Closing this as completed.

amirgon avatar Nov 05 '22 21:11 amirgon

I think it's a great step to avoid a lot of issues in the future.

kisvegabor avatar Nov 05 '22 21:11 kisvegabor