topic-modeling-tool
topic-modeling-tool copied to clipboard
CSV file as a set of documents?
Is it possible to use a CSV file as the set of input documents (i.e., where each row in the CSV file represents a different document)? We have a dataset containing thousands of documents and it's not practical to have each of these as a separate text file.
This is a feature of MALLET, and it used to be available in the TMT, but it proved difficult to maintain the tool while allowing both modes of input. However, we've done some refactoring since then, and it might be easier now. I've been thinking about this for a while and will look into it — thanks for the suggestion!