langchain icon indicating copy to clipboard operation
langchain copied to clipboard

Refactor TelegramChatLoader and FacebookChatLoader classes and add tests

Open hp0404 opened this issue 2 years ago • 4 comments

This PR includes two main changes:

  • Refactor the TelegramChatLoader and FacebookChatLoader classes by removing the dependency on pandas and simplifying the message filtering process.

  • Add test cases for the TelegramChatLoader and FacebookChatLoader classes. This test ensures that the class correctly loads and processes the example chat data, providing better test coverage for this functionality.

hp0404 avatar May 01 '23 07:05 hp0404

I noticed that the FacebookChatLoader uses the same pattern and rewrote it too - added changes that follow the same logic as with TelegramChatLoader (I wasn't sure if it was worth a separate PR).

hp0404 avatar May 01 '23 11:05 hp0404

  • Refactor the TelegramChatLoader and FacebookChatLoader classes by removing the dependency on pandas and simplifying the message filtering process.

why is this desirable? presumably, the filtering done by pandas was adding some functionality that is now gone

hwchase17 avatar May 02 '23 03:05 hwchase17

  • Refactor the TelegramChatLoader and FacebookChatLoader classes by removing the dependency on pandas and simplifying the message filtering process.

why is this desirable? presumably, the filtering done by pandas was adding some functionality that is now gone

we could add a parameter to NOT do the filtering, if you really want a way to avoid it. but i hesitate to delete it entirely

hwchase17 avatar May 02 '23 03:05 hwchase17

  • Refactor the TelegramChatLoader and FacebookChatLoader classes by removing the dependency on pandas and simplifying the message filtering process.

why is this desirable? presumably, the filtering done by pandas was adding some functionality that is now gone

I believe the original pandas code and the refactored version without pandas work the same way (the code passes tests, at least).

  • with pandas, the code:
    • created a dataframe from the 'messages' list
    • then discarded irrelevant items (keep "message" type and contents must be of string type)
    • then formatted rows into a string
  • refactored version builds a formatted string without pandas, also discarding irrelevant items (with the same filters)

If the functionality is the same (if I'm correct (have I missed something?), it is the same), then why ask users to install pandas if it can be done without it.

hp0404 avatar May 02 '23 08:05 hp0404