Allow encoding such as "encoding='utf8' " to be passed into TextLoader if the file is not the default system encoding.
any fix for this issue?
Yes, it will be most welcome improvement. I have encoding issues with polish language.
ah me too. German.
Me too. French
Korean, as well. :< Is there any update ?
Hi, @lexsf,
I'm helping the LangChain team manage their backlog and am marking this issue as stale. The issue you raised requests the ability to specify a non-default encoding, such as "utf8", when using TextLoader in the system to provide more flexibility in handling different file encodings. Several users have expressed similar encoding issues with different languages, such as Polish, German, French, and Korean, and are eager for a resolution. However, the issue remains unresolved.
Could you please confirm if this issue is still relevant to the latest version of the LangChain repository? If it is, please let the LangChain team know by commenting on the issue. Otherwise, feel free to close the issue yourself, or it will be automatically closed in 7 days.
Thank you for your understanding and cooperation.
Yeah bot, still there.
🤖
@baskaryan, could you please assist @lexsf with the issue regarding specifying a non-default encoding when using TextLoader in LangChain? They have confirmed that the issue is still relevant. Thank you!
This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.
Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.
If you want to continue the conversation, start your reply with @dosu-bot.
Hi all, we do accept TextLoader(encoding=...) as of https://github.com/langchain-ai/langchain/pull/2250, is that what you're looking for?
Full api reference here