langchain
langchain copied to clipboard
fix markdown text splitter horizontal lines
Fixes #5614
Issue
The ***
combination produces an exception when used as a seperator in re.split
. Instead \*\*\*
should be used for regex exprations.
Who can review?
@eyurtsev
@devstein, I appreciate your suggestion, but using re.escape would also escape the '' character in '\n', which isn't the intended behavior. It's important to note that these strings are regular expressions, so they should be explicitly defined for the sake of clarity and to better comprehend their functionality.
@hwchase17 I added a test for markdown. There were a few issues with the regex expressions for RST and Markdown which I fixed and added test covering this.