ModSecurity icon indicating copy to clipboard operation
ModSecurity copied to clipboard

Drop YAJL dependency

Open mikelolasagasti opened this issue 1 year ago • 32 comments

yajl library has been unmaintained upstream[1] since 2015. Last published release contians multiple CVEs (CVE-2023-33460, CVE-2022-24795, CVE-2017-16516) and fixes have had to be carried by downstream distributions and third parties. This is not an ideal situation. While a fork exists[2], it is unclear how widely adopted or blessed it is downstream.

As the maintainer of libmodsecurity for Fedora[3] and EPEL, I recently found that the yajl library will not be shipped with RHEL 10. This means a new maintainer would be required to add it to EPEL. The previous maintainer for RHEL & EPEL has recommended moving away from yajl[4].

Currently, libmodsecurity can be built without yajl, but I understand that making it mandatory is considered desirable, as per #3144 and #3151.

Given the security concerns and the upstream status of yajl, I recommend opening a discussion on dropping yajl as a dependency and exploring alternative JSON libraries, such as JSON-C, which is actively maintained and more widely adopted.

[1] https://github.com/lloyd/yajl [2] https://github.com/robohack/yajl/ [3] https://src.fedoraproject.org/rpms/libmodsecurity [4] https://lists.fedoraproject.org/archives/list/[email protected]/message/YPFHPOKAND3RZR7ZKWTDHUQEESG6IUJ3/

mikelolasagasti avatar Nov 26 '24 12:11 mikelolasagasti

Agree, we should change.

What do you think about the "official" YAML library?

airween avatar Nov 26 '24 12:11 airween

Note, I added label [2.x] too, because mod_security2 is also affected.

airween avatar Nov 26 '24 12:11 airween

This does not sound good. Thank you for bringing this to the attention @mikelolasagasti.

dune73 avatar Nov 26 '24 12:11 dune73

What do you think about the "official" YAML library?

I'm the Fedora yajl maintainer quoted in the link above. While YAML may be a superset of JSON, I'd not recommend using a YAML parser instead of a native JSON parser. It'll perform worse, and potentially open the code up to attacks that are impossible with pure JSON (aka the "Billion laughs attack" in YAML context). I can vouch for using json-c as an alternative to yajl, as that's what was successfully adopted in projects I work on. Be slightly cautious about jansson - when we previously tried it, we found it was not able to cope with the full numeric range of the uint64 type - if that's important to modsecurity's needs.

berrange avatar Nov 26 '24 14:11 berrange

Hi @berrange,

While YAML may be a superset of JSON, I'd not recommend using a YAML parser instead of a native JSON parser.

You're right, that was a mistake from me - sorry, and thank you.

airween avatar Nov 26 '24 15:11 airween

Thank you for this very valuable input @berrange. That puts the replacement discussion on a far better base.

dune73 avatar Nov 26 '24 15:11 dune73

These are the libraries that are reported (by their authors and third-parties) as the fastest:

  • https://github.com/stephenberry/glaze
  • https://github.com/simdjson/simdjson
  • https://ibireme.github.io/yyjson/

It would be interesting to check their functionalities for our intended use:

  • validation
  • callback to store ARGS (or should we avoid that and use only the resulting sJSON onjects?)
  • memory footprint

marcstern avatar Nov 27 '24 07:11 marcstern

There are a few subtle behavior quirks of yajl that should be examined for alternative libraries too. And then documented in case.

The behavior with empty request body for example. From the back of my head I am no longer sure if yajl rejects that and other parsers are OK with it or if it's the other way around. But I've seen this problem in production before.

dune73 avatar Nov 27 '24 07:11 dune73

This would definitely need extensive test cases

marcstern avatar Nov 27 '24 15:11 marcstern

Just a quick remark for this issue:

  • I started to make a small tool which can help to compare different JSON implementations; still work in progress
  • I've contacted with YAJL developer but after (almost) two weeks still no reply
    • Perhaps we should contact other contributors and fork YAJL repository, then fix the available issues...?

Other ideas?

airween avatar Feb 12 '25 10:02 airween

I've contacted with YAJL developer but after (almost) two weeks still no reply

AFAICT they haven't replied to anyone asking about YAJL on github in many years. I see no notable github activity on their account in 10 years.

* Perhaps we should contact other contributors and fork YAJL repository, then fix the available issues...?

The thing about yajl forks is that there are already many to choose from, each with a random selection of changes, but none seem to gain critical mass/attention.

Personally I'd just recommend letting yajl fade out. There are already two actively maintained alternative plain C JSON libraries - jansson & json-c - json-c slightly more active, as well as glib-json for apps which happen to use GLib. Between them they ought to be able to satisfy JSON needs for C applications, unless there's an unavoidable need for historical bug compatibility with YAJL. Even bug compatibility is questionable since JSON ought to be used/consumed in a way that ensures it is interoperable across impls, so it doesn't feel like further investment in YAJL is a long term benefit.

berrange avatar Feb 12 '25 10:02 berrange

AFAICT they haven't replied to anyone asking about YAJL on github in many years. I see no notable github activity on their account in 10 years.

I saw that in some place but gave a try...

* Perhaps we should contact other contributors and fork YAJL repository, then fix the available issues...?

The thing about yajl forks is that there are already many to choose from, each with a random selection of changes, but none seem to gain critical mass/attention.

that's said.

Personally I'd just recommend letting yajl fade out. There are already two actively maintained alternative plain C JSON libraries - jansson & json-c - json-c slightly more active, as well as glib-json for apps which happen to use GLib. Between them they ought to be able to satisfy JSON needs for C applications, unless there's an unavoidable need for historical bug compatibility with YAJL. Even bug compatibility is questionable since JSON ought to be used/consumed in a way that ensures it is interoperable across impls, so it doesn't feel like further investment in YAJL is a long term benefit.

If we don't have other chance we must switch, but honestly I'd like to avoid that. If we will switch, I do not see any reason to keep any compatibility with old bugs, I agree with you.

And thanks for following this issue.

airween avatar Feb 12 '25 12:02 airween

Just a side note: it seems like Debian package maintainer made a fork and he tries to contribute the code. This is the upstream of official Debian package.

airween avatar Feb 13 '25 14:02 airween

I made some researches in this topic, and now I'm completely disappointed. Honestly, I think it's harder to make a good decision than I thought.

Here are the libraries what I checked.

Libray Has GH organization Language Has SAX parser Speed Last commit Last release Has Debian package
json-c C slow (it uses traditional lexer and parser) 2025.01.09 2024.09.15
simdjson C++ (no C extension, need to write a wrapper) extreme fast 2025.03.06 2025.02.14
Jansson C slower than json-c 2024.03.31 2021.09.09
cJSON C fast 2024.09.23 2024.05.13
RapidJSON C++ (no C extension) fast 2025.02.05 2016.08.25
yyjson C fast 2025.03.12 2024.07.09 ✅ (After Trixie)
Ultrajson C fast 2025.03.01 2024.05.14
JSMN C fast 2021.10.14 2019.06.23
Centijson C ? 2024.01.20 -
cJSON C ? 2024.09.23 2024.05.13

I know the list is far not complete, but I haven't more time. If anyone can make similar research, please send me the results and I will add it to this sheet.

Here is a good collection of available parsers, but I'm not sure that's a complete list.

Aspects that I took into account:

  • has a GH organization: perhaps this is a bit weird aspect but learning from the current situation (YAJL developer does not response to anyone, does not merge PR's, ...) I'd be happy if I saw that the project belongs to an organization
  • language: it would be nice if the library also supported the C language; if the library's language is C++ but not native C support, we can make a wrapper, but then we bring a C++ compiler as a dependency
  • has SAX parser: YAJL uses SAX parser, I think we should stay on this method, especially in case of huge payloads
  • speed: this is a bit of a vague point; I didn't find any exact information, just asked chatGPT... but I think we should keep in mind
  • last commit and last release: I think it's obviously; these parameters shows how active the project is
  • has Debian package: yes I'm a Debian user, but I hadn't enough time to check other distributions that they have package or not; but I think it's an important indicator if the user can install dependencies from a package (especially the reason of this issue is that YAJL library no longer exists in the mentioned distribution).

If anyone has other aspects that we should consider, please leave a comment.

And any help are welcome to continue this research.

Update on 2025.03.27:

  • ȷson-c does not support SAX parsing
  • fixed Ultrajson URL
  • added JSMN
  • added Centijson
  • added cJSON

Update on 2025.03.28:

  • Janson does not support SAX parsing

Update on 2025.01.01:

  • yyjson has Debian package (after Trixie)

airween avatar Mar 13 '25 17:03 airween

To all: I've reviewed the available JSON parsers and organized them by my criteria - see this comment.

There are some important requirements that the candidate must meet:

  • must be use in native C - it's necessary for mod_security2; I don't want to use any own wrapper, because then the code will have a new dependency, a C++ compiler
  • it must have a SAX parser - in case of DOM parser, the processor builds the whole tree in memory; I think that would be a waste
  • be available as an operating system package - I think this does not need any special preparation: the original reason of this issue was that some distribution removed the YAJL packages.

Based on the sheet above (inside of the comment) there is only ONE JSON library which fit these requirements: the JSMN, but unfortunately that's - we can say - also an unmaintained codebase. The last commit was on 2021.10.14, and the last release was on 2019.06.23. This is very sad.

Actually I don't see the solution. I reviewed all listed library here (see the C libraries, you should scroll down), but I couldn't find any usable one.

Meanwhile I created a small tool which emulates the engines' behavior (especially it's closer to mod_security2's behavior). If anyone wants to try to add a new library, feel free to do that. Or anyone has an idea what library is missing from the list above, and thinks that would be a good candidate, please share that and I will add that.

The tool is here: https://github.com/digitalwave/jsonbench

Also please if you think I forget to add a usable library above, please notice me.

airween avatar Mar 31 '25 14:03 airween

yyjson seems to have an option read data using YYJSON_READ_INSITU without having to parse the whole DOM document. It's not SAX, but could be close enough for modsecurity needs?

Also, yyjson is packaged for Debian Testing and many other distributions:

  • https://tracker.debian.org/pkg/yyjson
  • https://repology.org/project/yyjson/versions

mikelolasagasti avatar Mar 31 '25 21:03 mikelolasagasti

Looks like the i3 project is also exploring alternatives to YAJL: https://github.com/i3/i3/issues/6257

mikelolasagasti avatar Mar 31 '25 21:03 mikelolasagasti

yyjson seems to have an option read data using YYJSON_READ_INSITU without having to parse the whole DOM document. It's not SAX, but could be close enough for modsecurity needs?

I have to check this, thanks. Btw if you want to take a look and/or try that, feel free to add yyjson to jsonbench tool.

Also, yyjson is packaged for Debian Testing and many other distributions:

* https://tracker.debian.org/pkg/yyjson
* https://repology.org/project/yyjson/versions

Ah, thanks - now I realized that it has been added in May of last year, so it's part of SID and testing (which is the Trixie now). But it's a good sign.

Thanks again for your help!

airween avatar Apr 01 '25 06:04 airween

Looks like the i3 project is also exploring alternatives to YAJL: i3/i3#6257

Yes, and it seems like they try yyjson too.

Thanks, now I think it worth a try.

airween avatar Apr 01 '25 06:04 airween

The tool is here: https://github.com/digitalwave/jsonbench

It seems it's not public.

mikelolasagasti avatar Apr 11 '25 14:04 mikelolasagasti

The tool is here: https://github.com/digitalwave/jsonbench

It seems it's not public.

Ah, sorry - now it's public.

airween avatar Apr 11 '25 16:04 airween

Hello! Has anyone upgraded to RHEL10 and managed to install yajl? I have been stuck for a number of days. The package is not even in EPEL

bujikun avatar Sep 05 '25 10:09 bujikun

@bujikun check this COPR I created with the builds for EPEL10: https://copr.fedorainfracloud.org/coprs/mikelo2/modsecurity-el10/

mikelolasagasti avatar Sep 05 '25 14:09 mikelolasagasti

@bujikun check this COPR I created with the builds for EPEL10: https://copr.fedorainfracloud.org/coprs/mikelo2/modsecurity-el10/

Thanks @mikelolasagasti. This issue is still on my list with priority (unfortunately I was very busy in last few months).

airween avatar Sep 05 '25 14:09 airween

Hello, If not afraid of "not recommended" repository but on official one "yajl-devel" is still in the rhel10 devel 10 repostitory : dnf install almalinux-release-devel (https://wiki.almalinux.org/repos/AlmaLinux.html)

Here is my output : dnf search yajl AlmaLinux 10 - Devel 1.0 MB/s | 463 kB 00:00
================================ Name & Summary Matched: yajl ================================ yajl.x86_64 : Yet Another JSON Library (YAJL) yajl-devel.x86_64 : Libraries, includes, etc to develop with YAJL

I was able to create my own rpm and even nginx + nginx connector (+/- with mock too)

Good luck.

tom8941 avatar Sep 06 '25 12:09 tom8941

"yajl-devel" is still in the rhel10 devel 10 repostitory

That repository is an AlmaLinux feature and it's not available for RHEL/EPEL. It can't be used to create official EPEL builds.

mikelolasagasti avatar Sep 06 '25 15:09 mikelolasagasti

Another data point from a prominent C library:

Xen 4.21.0-rc1 was released today. yajl is no longer a hard dependency. Xen will prefer json-c to yajl, if ./configure can find it.

Please test and feed back any issues you find.

gruenich avatar Oct 11 '25 07:10 gruenich

Hi @gruenich,

Another data point from a prominent C library:

Xen 4.21.0-rc1 was released today. yajl is no longer a hard dependency. Xen will prefer json-c to yajl, if ./configure can find it. Please test and feed back any issues you find.

thanks for sharing this.

Unfortunately there is a very important missing feature from json-c: it does not have a SAX style parser (see my comparative sheet). Without that, the used JSON parser parses the whole request (or response) body, without limitation (number of arguments - the payload size still can be a limit).

I think based on the comparison there aren't too much option, I think the best choice now is the RapidJSON. It's presented in most distribution (this is the biggest disadvantage of YAJL now), and has a SAX parser. The problem is the C++ dependency, but I think all distribution has a C++ compiler, so this is less problematic than there is no required dependency at all.

I really hope that I can integrate this into the source trees this year and make releases with the new parser (actually I fight with two other serious issues (beside my regular work)).

airween avatar Oct 11 '25 21:10 airween

RapidJSON had its last release in 2016, lots of open issues, and numerous unmerged pull requests. Similar to yajl, there are forks but non really took over, see https://github.com/Tencent/rapidjson/issues/2321. This will not improve the situation, it is out of the frying pan into the fire.

gruenich avatar Oct 12 '25 06:10 gruenich

RapidJSON had its last release in 2016, lots of open issues, and numerous unmerged pull requests. Similar to yajl, there are forks but non really took over, see Tencent/rapidjson#2321. This will not improve the situation, it is out of the frying pan into the fire.

Thanks for showed this thread under rapidjson's repository.

Yes, I saw the last release date when I started collecting available libraries, but then I checked the last commits too. In case of RapidJSON that was a few months before that date. I read the last comment under your mentioned issue, and I don't think it's a big problem if there is no release, but the code was constantly maintained.

Anyway, also in your mentioned issue I saw a new library was showed, that I haven't seen yet, that's the Nlohmann-JSON. Unfortunately it does not belong to any organization (like Lloyd's YAJL), but it seems it's maintained and many distribution supports it. And it seems like that also has a SAX-parser, which is very important for us.

What do you think about this, guys?

airween avatar Oct 12 '25 09:10 airween