regex test marker traits more rigorously on public API types

Since the regex crate uses interior mutability and a bit of synchronization, it follows that marker traits like Send, Sync, UnwindSafe and RefUnwindSafe are quite relevant and can change based on internal implementation details. #749 for example, removed the thread_local dependency and got a bit more rigorous about these marker traits on the main Regex/RegexSet types, but stopped there.

It turns out other types are impacted too. @kangalioo noted that, for example, Send was not implemented on the CaptureMatches iterator in regex 1.3.9, but it is now implemented in regex 1.4.5. I'm actually not sure why it wasn't implemented before. The regex crate doesn't use any non-Send stuff, so I'm not sure where that was coming from. I haven't investigated yet.

Either way, this is a compatibility hazard. Going from !Send to Send is fine, but it seems just as likely that we could regress too. The only way I can think to solve this is to assert expected marker traits for all public API types. It seems kind of absurd and tedious, but I don't have any better ideas.

Apr 08 '21 12:04 BurntSushi

Can I take a jab at implementing these tests? If it's ok, can you give me some pointers on where I can find similar tests?

Jun 30 '21 16:06 alexfertel

I think tests/test_default.rs is the place. You'll want to look for oibits, which is legacy terminology for marker traits that meant "Opt-In Built-In Traits."

Ideally there would be some way to write a test (maybe using shell script or Python, not sure) that would fail in the case of a public API type that doesn't have an oibit test. That way, if a new public API type is added, we're forced to add an oibit test for it. But maybe this is too much machinery.

Jun 30 '21 17:06 BurntSushi

Thank you for the quick response!

So, if I understood correctly, we want to avoid having to repeat groups of assertions like:

    assert_send::<Regex>();
    assert_sync::<Regex>();
    assert_unwind_safe::<Regex>();
    assert_ref_unwind_safe::<Regex>();
    assert_send::<RegexBuilder>();
    assert_sync::<RegexBuilder>();
    assert_unwind_safe::<RegexBuilder>();
    assert_ref_unwind_safe::<RegexBuilder>();

by writing a script that takes a list (or somehow knows about every public API type) and generates the list of assertions?

Sounds good to me, and it is very interesting. Let me know if this is the case.

Jun 30 '21 17:06 alexfertel

While avoiding repetition is a good idea, that should probably be done with a macro.

The purpose of the script is not about repetition. It's about automation. Consider the case of when a new public API type is added. How does that person know to go add it to the oibit traits? They don't. It would be good to have a test that fails if a public API type exists and is not in the oibit test.

With that said, I do not want anything that is terribly complex, difficult to maintain or brings in new dependencies. If there isn't a simple solution for it, then it might not be worth it. If you do want to pursue this extra automation step, then please discuss your plan with me before you invest too much effort.

But the main objective is to get oibit tests for each public API type.

Jun 30 '21 18:06 BurntSushi

Yeap, I agree. I also thought about a macro but wasn't sure. 😅

Ok, so, then this is divided into two tasks:

Tasks

Tedious repetition of the group of assertions for each public API type 👍🏻 (Baseline)
Some way of scanning public API types so we can generate tests that fail when types aren't in the oibit test. (Needs research)
Add a macro to avoid repetition. (I think we don't need that, see Ideas)

Constraints for the automation:

Reasonably simple
Maintainable
Zero dependencies

Ideas

My idea for task 2 right now is to check how does documentation for types gets generated, and maybe somehow extract the types from there? I'm pretty new to Rust so I don't know what introspection abilities it has.

I think task 3 might be avoided by doing something like:

fn assert_oibits<T: Send + Sync + UnwindSafe + RefUnwindSafe>() {}

assert_oibits::<Regex>();

Let me know if all of this makes sense.

Jun 30 '21 21:06 alexfertel

What I got for task 2 so far is:

Writing a Bash or Python script that generates docs and parses the Auto Trait Implementations section of each type to see if it implements what we want.
Adding a separate crate that depends on some parser, parses the lib and generates assertions.
static_assertions

Point 2 seems way overcomplicated. Points 2 and 3 add dependencies. So, I think the best one, right now, is point 1.

Jul 01 '21 00:07 alexfertel

Combining the marker traits into one routine is a good approach. I think all public API types in this crate should satisfy all four of those? Hmm, maybe not, the iterators might not.

Ideally, rustdoc (perhaps through cargo doc) could be convinced to just give us a simple list of every public API type in a crate. @GuillaumeGomez or @jyn514, is that possible?

Jul 01 '21 23:07 BurntSushi

@BurntSushi you might be able to get something to work with cargo rustdoc -- -Z unstable-options --output-format json. I'm not intimately familiar with the JSON schema so you'll need to play around with it a bit.

Jul 01 '21 23:07 jyn514

@jyn514 Thank you! @alexfertel perhaps give that a whirl?

Jul 01 '21 23:07 BurntSushi

See also the ugly hack tokio uses (and the discussion about showing this in rustdoc): https://github.com/rust-lang/rfcs/pull/2963#issuecomment-669344913

Jul 01 '21 23:07 jyn514

Yeap, I'll give it a try!

Jul 01 '21 23:07 alexfertel

@BurntSushi Yeap, I managed to get the public types from the generated JSON. 😁 🎉

I'll try to have a Python script working in the next couple of days, and I'll submit a PR. Would that be ok?

Jul 02 '21 04:07 alexfertel

SGTM! Thanks. :)

Jul 02 '21 10:07 BurntSushi

@BurntSushi I will use the function assert_oibits in the generated tests. The reason being that while inspecting the generated docs, I found that, as of today, they are implemented by all of the public API types (I am not sure if this is the correct terminology).

fn assert_oibits<T: Send + Sync + UnwindSafe + RefUnwindSafe>() {}

Jul 03 '21 17:07 alexfertel

Ah that's great. Makes it easy.

I suppose it would be good to switch from oibits to marker_traits. So, assert_marker_traits instead of assert_oibits.

Jul 03 '21 17:07 BurntSushi

Well, scratch that last comment, I generated the tests and some of the types aren't Sync, as you thought. The issue is that I was checking for the existence of the trait as a child node in the JSON. However, there is an is_negative flag that I was ignoring.

My idea was to check all the traits for every type, but now it's not so easy. We're not solving the problem of "Having a test that fails if a public API type exists and is not in the oibit test", because each time we run the script, the tests have to be generated based on the traits implemented by the types, not just the types.

The solution I see is the following: The first time we generate the tests, we maintain a map of the traits implemented by the types. We can only insert to this map if there are new types exposed by the crate.

Do you see another way of solving this?

Jul 03 '21 21:07 alexfertel

@alexfertel I think the key simplification here is that we don't need a robust script to generate the tests. Technically, writing them by hand would be just fine. What we need is way to check for the existence of a test. Since we control what the tests look like, that seems solvable with grep. If a test doesn't exist for a particular public API type, then CI should fail. That's it. We don't need a script that re-generates the tests. A human can add them incrementally. In particular, the regex public API experiences very few changes. For example, I don't think there have been any new public API types added in the last couple years, so there's no need to automate that part. Maybe we automate the initial creation of it, and it would be nice to at least document a key shell command you used to do that piece. But for example, you could assume every type implements all four marker traits, and then just hand massage the types that don't. That's totally fine.

Jul 03 '21 22:07 BurntSushi

regex regex copied to clipboard

test marker traits more rigorously on public API types

Tasks

Constraints for the automation:

Ideas

regex
regex copied to clipboard