rune icon indicating copy to clipboard operation
rune copied to clipboard

Testing against GNU Emacs

Open CeleritasCelery opened this issue 1 year ago • 22 comments

We are striving to be “bug compatible” with GNU Emacs, in so far as it makes sense to do so. We might break with some behavior if it is obscure enough or adds enough value to justify it. But the right answer to “what should this function do?” Is almost always “whatever GNU Emacs does”.

Given that, we would like to have a way to test against Emacs and compare behavior. This issue is to brainstorm the best way to do that.

Currently the plan is to create a new rust binary that can feed a test file into Rune and GNU Emacs and make sure they get the same output. If one throws an error, so does the other. And the result of each expression is the same.

This can be expanded to include fuzzing/property testing. There could be some code to parse each built in function in Rune and get the type signature. We could then test the arity, accepted types, and random values against GNU Emacs. This would help flush out edge cases and differences in behavior. It could also help us catching changes between major version upgrades of GNU Emacs.

We could also create dedicated fuzzers for specific functionality. For example we have some code to convert a lisp regex to a rust regex. We could send in random regex and ensure that if Emacs considers it valid, then it is also converted to valid rust regex. Another example is printing; ensuring that printed representation of everything is the same.

CeleritasCelery avatar Dec 06 '23 21:12 CeleritasCelery

Currently the plan is to create a new rust binary that can feed a test file into Rune and GNU Emacs and make sure they get the same output. If one throws an error, so does the other. And the result of each expression is the same.

I'm thinking about this. I really like this Github Action to set up emacs at specific version. We can then prop test by generating some elisp that the emacs command on CI can run. There are different ways to do it, but probably the easiest is to generate an .el file and have emacs run it with -q -x. We can generate an .el file from all the defuns we generate. It could get complicated fast, say, defuns need to run in a specific order to work correctly. I would imagine GNU Emacs would fail as well, and the output would be consistent vs rune.

Qkessler avatar Dec 07 '23 08:12 Qkessler

@Ki11erRabbit As you have seen with string-version-lessp (#78) It can be hard to match the behavior of GNU Emacs when porting code, even when you have the source code in front of you. The idea in this issue is to create a separate utility to compare Rune and GNU Emacs with property testing to help flush out issues. I have started working on this with a little CLI tool here. All it does right now is compare to see if the same functions exist, but we could expand it to generate input to the functions and compare the outputs. I have started writing it in Rust with proptest, but we could in theory write it in any other language (like python with hypothesis).

If you are interested this might be something you could tackle, because you have already seen how hard it is to get the functionality right. I have hit many behavior mismatches before, but usually they are deep in some other code and it takes me hours to find the source. With this utility we could fuzz new functions as they are created and hopefully make the quality much higher and bugs easier to flush out. I am sure our current defun's are loaded with issues that we just haven't seen yet. Let me know what you think.

CeleritasCelery avatar Jun 06 '24 20:06 CeleritasCelery

If you are interested this might be something you could tackle, because you have already seen how hard it is to get the functionality right. I have hit many behavior mismatches before, but usually they are deep in some other code and it takes me hours to find the source. With this utility we could fuzz new functions as they are created and hopefully make the quality much higher and bugs easier to flush out. I am sure our current defun's are loaded with issues that we just haven't seen yet. Let me know what you think.

I wouldn't mind helping, although I will be very busy the next 2 weeks or so. But after that I should be able to help with this.

Ki11erRabbit avatar Jun 06 '24 21:06 Ki11erRabbit

I am now free to help. What would you like me to do @CeleritasCelery?

Ki11erRabbit avatar Jun 24 '24 22:06 Ki11erRabbit

It depends on what you want to do 😄 . If working on the this particular item interests you, than take a look here at what I have started with. If you wanted to start over, that is fine too. Essentially this code runs both Rune and GNU Emacs and compares the output. Right now it only sees if they both have the same functions defined, but we can expand it to do more. Some next steps:

  1. We need to extract more information about the function definitions from the rust source. Probably using syn. This includes the number of arguments, number of optional arguments, return type, and argument types.

  2. Make sure that function arity matches between the two implementation (using func-arity).

  3. Use a prop testing library to generate random inputs and ensure the functions return the same outputs. If we know the types a function expects we can narrow down the types that we generate to test more interesting properties. For example we could test a bunch of random strings with string-version-lessp to help catch any corner cases that we missed.

Of course this all depends on if this is something you want to work on. I think it would be a good task because it is open and does not require a lot of context about the current system. But if there is something else you are interested in more, let me know.

CeleritasCelery avatar Jun 25 '24 00:06 CeleritasCelery

Sure I get to work on something. It should keep me busy.

Ki11erRabbit avatar Jun 25 '24 01:06 Ki11erRabbit

I have a few questions after thinking about the problem.

How aware do we want the tester to be aware of the types? Since if we just take an object, we don't actually know what the type is. It would be nice to have a list of possible types that a function could have even if the list is somewhat incomplete.

Should I make a type that represents a function and use that to generate arbitrary function calls that we could test against Emacs? If so it will rely on the above information to provide arbitrary input.

Ki11erRabbit avatar Jun 25 '24 18:06 Ki11erRabbit

How aware do we want the tester to be aware of the types? Since if we just take an object, we don't actually know what the type is. It would be nice to have a list of possible types that a function could have even if the list is somewhat incomplete.

Agreed. The more specific the types, the more useful the input we can generate. Many builtin functions use specific types like &str and usize that we can extract, but for ones that take Object we don't have that info. We need to think of some way to include it. Maybe through a comment, annotation, or attribute?

Should I make a type that represents a function and use that to generate arbitrary function calls that we could test against Emacs? If so it will rely on the above information to provide arbitrary input.

I think that is a good approach.

CeleritasCelery avatar Jun 25 '24 19:06 CeleritasCelery

While working on the tester I thought of way to provide more type information.

I think that an attribute might be best if it can do these things:

  1. State the positional arg (like an int). This way we can indicate multiple types for the same argument
  2. Give the type name. This is for convience
  3. State whether or not the argument is optional.

I think that this would make parsing with Syn much easier.

Ki11erRabbit avatar Jun 28 '24 03:06 Ki11erRabbit

Most of that information should already be there.

The Rust types should map fairly cleanly to the lisp types.

pub(crate) fn string_lessp<'ob>(
    string1: StringOrSymbol<'ob>,
    string2: StringOrSymbol<'ob>,
) -> Result<bool> {

This tells us that the argument type is a string or symbol (which means we can test a string against it)

pub(crate) fn less_than(number: Number, numbers: &[Number]) -> bool {

This tells us that the arguments are numbers (either int or float)

Optional arguments from lisp are Option in Rust.

pub(crate) fn require<'ob>(
    feature: &Rto<Gc<Symbol>>,
    filename: Option<&Rto<Gc<&LispString>>>,
    noerror: Option<()>,
    env: &mut Rt<Env>,
    cx: &'ob mut Context,
) -> Result<Symbol<'ob>> {

Here we know that feature is a symbol, filename is an optional string, noerror is just optional (nil or t). This is the reason we use Option<()> for optionals instead of bool. it let's us distinguish between required boolean flags and optional lisp values.

Let me know if I am not understanding your question.

CeleritasCelery avatar Jun 28 '24 19:06 CeleritasCelery

I think you are on point. Could you maybe make a list of all of the types and their equivalents in elisp?

Ki11erRabbit avatar Jun 29 '24 20:06 Ki11erRabbit

sure thing.

Rust Type Elisp Type
usize integer
i64 integer
isize integer
f64 float
Number integer or float
&str string
StringOrSymbol string
bool t or nil
List nil or cons
Function function
Option<()> nil or non-nil
ByteString unibyte-string
LispVector vector
LispHashTable hash-table
Symbol symbol
Cons cons
Record record
ByteFn byte-code-function
SubrFn subr
Buffer buffer

Some of these like string, integer, and float will be easiest to generate data for.

CeleritasCelery avatar Jun 30 '24 03:06 CeleritasCelery

Thank you that has been very helpful. Although, could we make an alias for Option<()>? It creates a slightly weird edge case in my code. It would also make the type much clearer.

Ki11erRabbit avatar Jun 30 '24 17:06 Ki11erRabbit

I am fine with that. What should we call the type?

CeleritasCelery avatar Jun 30 '24 22:06 CeleritasCelery

I am fine with that. What should we call the type?

I am thinking something like AnyOrNil or something along those lines.

Ki11erRabbit avatar Jun 30 '24 23:06 Ki11erRabbit

I added a type alias called OptionalFlag for that type.

CeleritasCelery avatar Jul 01 '24 14:07 CeleritasCelery

I added a type alias called OptionalFlag for that type.

Thank you

Ki11erRabbit avatar Jul 01 '24 16:07 Ki11erRabbit

I thought I would give an update. I have manged to get it to generated a very large test file that has random values. There are still some thinks to work out though.

I have one concern. I don't know how to handle randomly generated functions. Right now they have a random arity < 0 and return nil. I think that they should return something other than nil sometimes

Ki11erRabbit avatar Jul 02 '24 03:07 Ki11erRabbit

That’s great to hear! Feel free to open a PR.

As far as function go, I think we will need more info on what kind of function is needed. Otherwise you won’t be actually testing interesting properties of the defun. We could always just skip them for now. Maybe some attribute or comment that provides info on what kind of function to generate.

CeleritasCelery avatar Jul 02 '24 04:07 CeleritasCelery

The only things left are to make it so that lists actually have elements in them, make a decent cmdline interface, and set up a test harness.

After I fix the list bug and give it a cmdline interface should I submit a PR?

Ki11erRabbit avatar Jul 02 '24 05:07 Ki11erRabbit

Yes please!

CeleritasCelery avatar Jul 02 '24 15:07 CeleritasCelery

I also thought of a way to solve the function arity issue. We could just make a type alias to Function that specifies the arity.

Ki11erRabbit avatar Jul 02 '24 15:07 Ki11erRabbit