pdf-toolbox
                                
                                
                                
                                    pdf-toolbox copied to clipboard
                            
                            
                            
                        workflow to extract text
hi - i just used #master to do the common thing of extracting all text from a pdf. it worked, thanks for the nice library! it took a while to figure out how to do it and required more contortions than i expected. perhaps you could add some api support for such a basic task? here's what i wound up with, is this what you expect users to do?
main = do
  withPdfFile "file.pdf" $ \pdf -> do
    txt <- extract pdf =<< catalogPageNode =<< documentCatalog =<< document pdf
extract pdf = (T.concat <$>) . (traverse ((extract' =<<) . loadPageNode pdf) =<<) . pageNodeKids
  where
    extract' (PageTreeLeaf tn) = pageExtractText tn
    extract' (PageTreeNode tn) = extract pdf tn
                                    
                                    
                                    
                                
Yeah, simpler API would be great. Though I'm not sure how exactly it should look like. I'll think about it, thank you for the input.