datumaro
datumaro copied to clipboard
Known potential security issues in a server-side use of the library
These things are not necessarily harmful in a CLI tool use cases, and they can even be desired in such scenario. However, they can be undesirable in a server-side use of the library.
- Most formats have some kind of enumeration of the file list, related to a dataset. If an input dataset has specifically crafted file lists (with absolute or relative paths), an attacker can make the server to read an arbitrary file in the filesystem. If the server returns an unfiltered error message, this can lead to an information leak (server configuration information, version hints etc.). This can be avoided from the server side, for example, with a
chroot()
call. - During the importing of a dataset, in all formats files are searched for in a depth-wise manner, starting from the pointed root directory. If an input directory has lots of thrash files, or has many levels of nesting, it can be used for DoS attacks. To resolve this, the maximum depth of searching is limited by some small number (~5 levels) on the library level.