Lincoln Mullen
Lincoln Mullen
A local alignment when printed needs an extra new line at the end so the command prompt doesn't stay on the same line.
There should be a way to get a matrix and a sparse matrix out of the textreuse candidates df. The sparse matrix should be in the format that apcluster expects.
Just as in quanteda, the functions which call Rcpp versions should be parallelized with RcppParallel. They should all have an argument that sets the number of cores: `cores = getOption("mc.cores")`...
gensim has a lemmatizing tokenizer, which, instead of stemming words, converts them to their lemma. For instance, "was," "being," "am" would tokenize to "be." https://radimrehurek.com/gensim/utils.html
- Methodist data - Geography of the Post: https://github.com/stanford-history/geographypost/tree/master/data - Overland Travels: https://history.lds.org/overlandtravels/ - The library data I scraped for Miriam Posner, if it has an open license.
Clarke Bursley has created a dataset of US troop strengths in various countries. Add this as a dataset.
If license permits it.
In the Methodists data set, when there is a missing value for membership by race we replace it with a 0 (mostly for pedagogical purposes). But if the meeting existed...