bigbang
bigbang copied to clipboard
Scientific analysis of collaborative communities
As noted in #355, just need to update the script help text that the date format is ordinal.
Some users want to extract data from mailing lists for graphing and analysis in some other program, like Excel. We should make it clearer how to export data to CSV...
See this issue: https://github.com/nllz/bigbang/issues/10#issuecomment-227876957 Documentation on the use of collect_mail.py with file import is not clear enough. URLs need to point specifically to the archive page of each list.
There are multiple "entity resolution" functions for mailing lists currently in BigBang. These should be consolidated into a single module with the relevant differences documented.
``` from bigbang import repo_loader # The file that handles most loading repo = repo_loader.get_repo("numpy", in_type = "name" ) repo.commit_data[:5] ``` The dataframe's first column is named "Unnamed: 0". It...
In git_repo.py, when I use get_repo(url,"remote"), cache = "none" so the function call self.repo = Repo(url) with undefined Repo. I can't figure what class is used here.
process.activity() doesn't exist. It seems because the class Archive wrapper is not used, and we have to wrap the dataframe and use get_activity(). I will fix it later.
I try to use the Single Word Trend notebook example and raise an error when iterating across the archive. Some mails (seems to be multipart mails) have a None body...
Currently, the repo loader files and its logic are all too complicated and definitely contain bugs. I need to make sure they will be able to handle loading many different...
Sometimes when an Archive is loaded from data on the file system, the "Date" column is of type string rather than datetime. This irregularity makes it hard to compare mailing...