open-research
open-research copied to clipboard
[例会] Reference Relation Extraction for building Open Source Database Socio-Technical Network
Description
Relationship in Github Events
- Actor-Actor: Collaboration [Co-occurrence, member, follow]
- Material-Material: Dependency [commit(WoC), submodule, package(npm, pip, maven,...)]
- Actor-Material: Reference [Distributor/Distributed, Participant, Author]
Introdcution Github上除了人与人的协作关系,物料与物料的依赖关系之外,还存在人与物料之间的社会技术网络,它对于项目的维护至关重要。Github社会技术网络中的引用关系对于研究软件生态中的knowledge of cross-project technical dependencies[1]非常关键:从信息依赖角度看,与之相比,物料之间的依赖相对稀疏,人与人之间的协作相对模糊。 本文选取Open Source Database Repository为对象,基于其GitHub日志数据构建出以Reference关系为连边的Socio-Technical Network,探究Github开源数据库领域的知识依赖和相关的人员、物料情况,对于项目安全、项目维护、项目的社区影响力有独特的意义。
Related Works 本数据构建网络的边是多种子类型的引用关系,节点是各个事件日志的参与实体。与[1,2,6, 8]的工作对比,边和结点粒度更细致,网络信息更丰富,除了项目间的引用外,还能反映出项目内维护过程中的人与物料之间的信息交换情况;与[3,4,7]的统计分析对比,构建网络可以度量更多的引用特性;与[5]对比,场景更偏向于协作和维护。特别地,单独看body中的链接识别方法,与[8]工作十分相似,本文所用的识别规则参考了Github REST API的说明,且对节点和边建模的系统性更强,未来可进行相同数据集上识别出链接数的进一步对比。
Attachment Reference Relation Extraction for building Open Source Database Socio-Technical Network.pptx
Reference [1] Blincoe K, Harrison F, Kaur N, et al. Reference coupling: An exploration of inter-project technical dependencies and their characteristics within large software ecosystems[J]. Information and Software Technology, 2019, 110: 174-189. [2] Blincoe K, Harrison F, Damian D. Ecosystems in GitHub and a method for ecosystem identification using reference coupling[C]//2015 IEEE/ACM 12th Working Conference on Mining Software Repositories. IEEE, 2015: 202-211. [3] Zhang Y, Yu Y, Wang H, et al. Within-ecosystem issue linking: a large-scale study of rails[C]//Proceedings of the 7th international workshop on software mining. 2018: 12-19. [4] Jiang J, Cao J, Zhang L. An empirical study of link sharing in review comments[C]//Software Engineering and Methodology for Emerging Domains: 16th National Conference, NASAC 2017, Harbin, China, November 4–5, 2017, and 17th National Conference, NASAC 2018, Shenzhen, China, November 23–25, 2018, Revised Selected Papers 16. Springer Singapore, 2019: 101-114. [5] Ye D, Xing Z, Kapre N. The structure and dynamics of knowledge network in domain-specific q&a sites: a case study of stack overflow[J]. Empirical Software Engineering, 2017, 22: 375-406. [6] Wang D, Xiao T, Thongtanunam P, et al. Understanding shared links and their intentions to meet information needs in modern code review: A case study of the OpenStack and Qt projects[J]. Empirical Software Engineering, 2021, 26: 1-32. [7] Zhang Y, Wu Y, Wang T, et al. iLinker: a novel approach for issue knowledge acquisition in GitHub projects[J]. World Wide Web, 2020, 23(3): 1589-1619. [8] Liu B, Zhang L, Jiang J, et al. A method for identifying references between projects in GitHub[J]. Science of Computer Programming, 2022, 222: 102858.