langchain icon indicating copy to clipboard operation
langchain copied to clipboard

Allow encoding such as "encoding='utf8' " to be passed into TextLoader if the file is not the default system encoding.

Open lexsf opened this issue 2 years ago • 3 comments

lexsf avatar Mar 11 '23 00:03 lexsf

any fix for this issue?

AldawsariNLP avatar Mar 12 '23 19:03 AldawsariNLP

Yes, it will be most welcome improvement. I have encoding issues with polish language.

LeNowak avatar Mar 21 '23 14:03 LeNowak

ah me too. German.

jonosooty avatar May 18 '23 12:05 jonosooty

Me too. French

LeMoussel avatar Sep 04 '23 13:09 LeMoussel

Korean, as well. :< Is there any update ?

allzero-kwon avatar Oct 02 '23 12:10 allzero-kwon

Hi, @lexsf,

I'm helping the LangChain team manage their backlog and am marking this issue as stale. The issue you raised requests the ability to specify a non-default encoding, such as "utf8", when using TextLoader in the system to provide more flexibility in handling different file encodings. Several users have expressed similar encoding issues with different languages, such as Polish, German, French, and Korean, and are eager for a resolution. However, the issue remains unresolved.

Could you please confirm if this issue is still relevant to the latest version of the LangChain repository? If it is, please let the LangChain team know by commenting on the issue. Otherwise, feel free to close the issue yourself, or it will be automatically closed in 7 days.

Thank you for your understanding and cooperation.

dosubot[bot] avatar Feb 05 '24 16:02 dosubot[bot]

Yeah bot, still there.

lexsf avatar Feb 05 '24 16:02 lexsf

🤖

@baskaryan, could you please assist @lexsf with the issue regarding specifying a non-default encoding when using TextLoader in LangChain? They have confirmed that the issue is still relevant. Thank you!


This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

dosubot[bot] avatar Feb 05 '24 16:02 dosubot[bot]

Hi all, we do accept TextLoader(encoding=...) as of https://github.com/langchain-ai/langchain/pull/2250, is that what you're looking for?

Full api reference here

baskaryan avatar Feb 05 '24 23:02 baskaryan