open-sustainable-technology
open-sustainable-technology copied to clipboard
Refine the database of organizations
The database of active open source organizations in the field of environmental sustainability is more important to many users than the list of projects themselves. Therefore, here are several thoughts on how we could expand this database:
- Add topics / fields where the organisations are active.
- Cluster namespaces based on legal organizations. Many larger organizations such as Google, NOAA, NASA, or LF Energy have multiple namespaces that we could link together.
- We could generate activity scores for organization name spaces.
- Using the social media accounts of the various organizations, it is easy to create a "news" feed about open source in the field of environmental sustainability. For sure with have to blacklist some very large generic social media accounts and create a simple way to quickly review such post.
I think we'll come up with a lot more here in the near future.
We can probably do this easily from the Crunchbase data of projects; we capture the primary organization and then look up all the relevant organizational data in Crunchbase from there.
The organization data has been updated and can be found here as spreadsheet: https://docs.getgrist.com/gSscJkc5Rb1R/OpenSustaintech/
Original CSV file: https://github.com/protontypes/AwesomeCure/blob/main/csv/github_organizations.csv
I cleaned and labeled: organizations_names, website, country and organizations form of about 200 new organizations. Possible next steps could be:
- Map projects topics to organizations so that we can map organizations based on topics.
- Improve "activity" of organizations. This is not correctly updated with the current script for datamining.
- Map organizations based on URL domain spaces.