Limitations and proper use of bibliometric network visualizations
The main idea of bibliometric network visualization is to allow large amounts of complex bibliographic data to be analyzed in a relatively easy way by visualizing core aspects of data. The strength of bibliometric network visualization is in the simplification it provides, but simplification comes at a cost. It typically implies a loss of information.
Loss of information takes place in reducing bibliographic data to a bibliometric network. For instance, when textual data is reduced to a co-occurrence network of terms, information on the context in which terms co-occur is lost. Similarly, when we have constructed a citation network, we can see who is citing whom, but we can no longer see why someone may be citing someone else.
Loss of information also occurs in the visualization of a bibliometric network. In the case of a distance-based visualization, for instance, it is usually not possible to position the nodes in a two-dimensional space in such a way that for any pair of nodes the distance between the nodes reflects the relatedness of the nodes with perfect accuracy. Distances reflect relatedness only approximately, and we therefore lose information. In the case of graph-based and timeline-based visualizations, we may need to restrict ourselves to visualizing a limited number of nodes, for instance the nodes with the highest degree in a network. This means that we lose information on the other nodes in the network.
Loss of information is especially problematic because it is often difficult to assess how much information is lost and to what extent this may affect the conclusions that can be drawn from a bibliometric network visualization. For instance, to what extent do the distances between the nodes in a distance-based visualization accurately reflect the relatedness of the nodes? To what extent does a visualization of a term cooccurrence network change if the selection of terms included in the visualization is changed? Even if we are aware that there may be inaccuracies in a bibliometric network visualization, it remains difficult to assess the magnitude and the consequences of these inaccuracies.
Related to this, it is often difficult to assess the sensitivity of a bibliometric network visualization to various technical choices. Would other technical choices have resulted in a completely different visualization, or would the differences have been minor? How strongly does a visualization depend on the values of all kinds of technical parameters, and is it possible to justify the choice of particular parameter values? Is a certain structure suggested by a visualization a reflection of the underlying data, or is it merely an artifact of the techniques used to produce the visualization? Researchers who regularly work with bibliometric network visualizations develop an intuition that helps them to give approximate answers to these types of questions, but most users of bibliometric network visualizations lack such an intuition, making it difficult for them to assess the accuracy of a visualization.
Given the above difficulties, our general recommendation is to use bibliometric network visualizations as a complement rather than as a substitute to expert judgment. When expert judgment and bibliometric network visualizations are in agreement with each other and point in the same direction, they strengthen each other. When they do not agree, this may be a reason for experts to reconsider their opinion, it may also be a reason to ask for the opinion of additional experts, or it may be a reason to check whether the visualizations may be inaccurate because important information has been lost or because of methodological issues. Bibliometric network visualizations are most useful when they are interpreted in a careful manner and used in combination with expert judgment. Also, visualization should be a means to an end, not an end in itself. For instance, when dealing with only a small amount of data, there often is no added value in the use of visualization. It may be much better to simply study the data directly.