localGPT
                                
                                 localGPT copied to clipboard
                                
                                    localGPT copied to clipboard
                            
                            
                            
                        Programming Language Support for Documents
Originally posted by @sime2408 in https://github.com/PromtEngineer/localGPT/issues/151#issuecomment-1597633918
This ticket is to support different methods of Document splitting. Specifically for different programming languages.
Currently, Documents are loaded and then split with vanilla RecursiveCharacterTextSplitter. As noted in langchain docs, this splitting is good for generic text as it keeps paragraphs together.
 
Different programming languages have different separators that should work to split programming documents better.  They can be defined by RecursiveCharacterTextSplitter.from_langauge
It'll be efficient if we can load documents into a Dict (see #147 and linked conversation)
@imjwang this is really helpful. I will be merging a major code change over next couple of days, can you please look into this afterwards? Thanks
Certainly