Joseph KK Ho e-resources: A mind mapping-based literature review approach to study Big Data

A mind mapping-based literature review (MMBLR) approach to study the Big Data topic

Joseph Kim-keung Ho

Independent Trainer

Hong Kong, China

Abstract: Big Data emerged in the early 2000s as a topic in eCommerce. It is examined in this paper by means of the mind mapping-based literature review (MMBLR) approach of Ho (2016) to unveil its knowledge structure. The exercise indicates that its knowledge structure comprises four main themes with associated ideas, viewpoints and empirical findings. Besides, the article illustrates how to conduct a mind mapping-based literature review. All in all, the article offers academic and pedagogical values on the topics of Big Data and the MMBLR approach. The exercise itself provides also a stimulating intellectual learning experience to the literature reviewer.

Keywords: Big Data, literature review, mind map, the mind mapping-based literature review (MMBLR) approach

Please cite the article as: Ho, J.K.K. 2016. “A mind mapping-based literature review (MMBLR) approach to study the Big Data topic” Joseph KK Ho e-resources blog October 9 (url address: http://josephho33.blogspot.hk/2016/10/a-mind-mapping-based-literature-review.html).

Introduction

The topic of Big Data is a relatively recent one, having emerged in the early 2000s. It is of academic and pedagogical interest to the writer who has been a lecturer on ecommerce. In this article, the writer presents his literature review findings on Big Data using the mind mapping-based literature review (MMBLR) approach. This approach was proposed by this writer this year and has been employed to review the literature on a number of topics, such as supply chain management, strategic management accounting and customer relationship management (Ho, 2016). The overall aims of this exercise are to:

1. Render an image of the knowledge structure of Big Data via the application of the MMBLR approach;

2. Illustrate how the MMBLR approach can be applied in literature review, especially in preliminary literature review.

The findings from the review exercise offer academic and pedagogical values to those who are interested in the topics of Big Data, literature review and the MMBLR approach. Other than that, this exercise facilitates this writer’s intellectual learning on these three topics. The next section makes a brief introduction on the MMBLR approach. After that, an account of how it is applied to study Big Data is presented.

An application of the mind mapping-based literature review (MMBLR) approach

The mind mapping-based literature review (MMBLR) approach was developed by this writer this year (Ho, 2016). It makes use of mind mapping as a complementary literature review exercise (see the Literature on mind mapping Facebook page and the Literature on literature review Facebook page) on thematic analysis. The MMBLR approach is made up of two steps. Step 1 is a thematic analysis on the literature of the topic chosen for study. Step 2 makes use of the findings from step 1 to produce a complementary mind map. The MMBLR approach is a relatively straightforward and brief exercise to study an academic topic. It is also an interpretive exercise in the sense that different reviewers with different research interest and intellectual background will inevitably select somewhat different ideas, facts and findings in their thematic analysis (i.e., step 1 of the MMBLR approach). Also, to conduct the approach, the reviewer needs to perform a literature search beforehand. Apparently, what a reviewer gathers from a literature search depends on what library/e-library facility is available to the reviewer. The next section presents the writer’s findings from the MMBLR approach step 1; afterward, a companion mind map is provided based on the MMBLR approach step 1 findings.

A thematic analysis on the Big Data literature

Step 1of the MMBLR approach is a thematic analysis on the literature of the topic under investigation (Ho, 2016). In our case, this is the Big Data topic. The writer gathers some academic articles from some universities’ e-libraries as well as via the Google Scholar. With the academic articles gathered, the writer conducted a literature review on them to gather a set of ideas, viewpoints, concepts and findings (called points here). These points from the Big Data literature are then grouped into four themes here. The thematic analysis endeavour is interpretive. Some of the themes are further divided into sub-themes. The thematic analysis findings are as follows:

Theme 1: Definitions and characteristics of Big Data

Point 1.1. “Big data is a term that describes large volumes of high velocity, complex and variable data that require advanced techniques and technologies to enable the capture, storage, distribution, management, and analysis of the information” (TechAmerica Foundation’s Federal Big Data Commission, 2012);

Point 1.2. “Laney (2001) suggested that Volume, Variety, and Velocity (or the Three V’s) are the three dimensions of challenges in data management. The Three V’s have emerged as a common framework to describe big data” (Gandomi and Haider, 2015);

Point 1.3. “Big Data is a loosely defined term used to describe data sets so large and complex that they become awkward to work with using standard statistical software..” (Snijders, Matzat and Reips, 2012);

Point 1.4. “big data are often characterized by relatively “low value density”. That is, the data received in the original form usually has a low value relative to its volume. How-ever, a high value can be obtained by analyzing large volumes of such data” (Gandomi and Haider, 2015);

Point 1.5. “We define Big Data as a cultural, technological, and scholarly phenomenon that rests on the interplay of: (1) Technology … (2) Analysis … (3) Mythology …” (Boyd and Crawford, 2012);

Point 1.6. “Big Data is less about data that is big than it is about a capacity to search, aggregate, and cross-reference large data sets…” (Boyd and Crawford, 2012);

Point 1.7. “The fast evolution of big data technologies and the ready acceptance of the concept by public and private sectors left little time for the discourse to develop and mature in the academic domain” (Gandomi and Haider, 2015);

Theme 2: Business and technological trends related to Big Data

Theme 2.1: Business-related

Point 2.1.1. “….organizations are swimming in an expanding sea of data that is either too voluminous or too unstructured to be managed and analyzed through traditional means. Among its burgeoning sources are the clickstream data from the Web, social media content (tweets, blogs, Facebook wall postings, etc.) and video data from retail and other settings and from video entertainment…” (Davenport, Barth and Bean, 2012);

Point 2.1.2. “…. In business, economics and other fields … decisions will increasingly be based on data and analysis rather than on experience and intuition” (Lohr, 2012);

Point 2.1.3. “..A key tenet of big data is that the world and the data that describe it are constantly changing, and organizations that can recognize the changes and react quickly and intelligently will have the upper hand…” (Davenport, Barth and Bean, 2012);

Point 2.1.4. “…. Over time, we believe big data may well become a new type of corporate asset that will cut across business units and function much as a powerful brand does, representing a key basis for competition. If that’s right, companies need to start thinking in earnest about whether they are organized to exploit big data’s potential and to manage the threats it can pose” (Brown, Chui and Manyika, 2011);

Point 2.1.5. “As the volume of data explodes, organizations will need analytic tools that are reliable, robust and capable of being automated. At the same time, the analytics, algorithms and user interfaces they employ will need to facilitate interactions with the people who work with the tools…” (Davenport, Barth and Bean, 2012);

Theme 2.2: Technology-related

Point 2.2.1. “..the computer tools for gleaning knowledge and insights from the Internet era’s vast trove of unstructured data are fast gaining ground.” (Lohr, 2012);

Point 2.2.2. “..…Over the past few years, nearly all major companies, including EMC, Oracle, IBM, Microsoft, Google, Amazon, and Facebook, etc. have started their big data projects..” (Chen, Mao and Liu, 2014);

Point 2.2.3. “It is estimated that the business data volume of all companies in the world may double every 1.2 years ….. The continuously increasing business data volume requires more effective real-time analysis so as to fully harvest its potential…” Chen, Mao and Liu, 2014);

Point 2.2.4. “Although major innovations in analytical techniques for big data have not yet taken place, one anticipates the emergence of such novel analytics in the near future. For instance, real-time analytics will likely become a prolific field of research because of the growth in location-aware social media and mobile apps.” (Gandomi and Haider, 2015);

Theme 3: Management practices and challenges related to Big Data

Theme 3.1: Associated technology-related

Point 3.1.1. “..The development of cloud computing provides solutions for the storage and processing of big data. On the other hand, the emergence of big data also accelerates the development of cloud computing…” (Chen, Mao and Liu, 2014);

Point 3.1.2. “..…..At present, the data processing capacity of IoT [internet of things] has fallen behind the collected data and it is extremely urgent to accelerate the introduction of big data technologies to promote the development of IoT.” (Chen, Mao and Liu, 2014);

Point 3.1.3. “In the IoT [internet of things] paradigm, an enormous amount of networking sensors are embedded into various devices and machines in the real world.… The big data generated by IoT has different characteristics compared with general big data..” (Chen, Mao and Liu, 2014);

Point 3.1.4. “the specialized tools of Big Data also have their own inbuilt limitations and restrictions. For example, Twitter and Facebook are examples of Big Data sources that offer very poor archiving and search functions…” (Boyd and Crawford, 2012);

Theme 3.2: The data management-related

Point 3.2.1. “Most DBMSs [database management systems] are designed for efficient transaction processing: adding, updating, searching for, and retrieving small amounts of information in a large database…..…The trouble comes when we want to take that accumulated data, collected over months or years, and learn something from it and naturally we want the answer in seconds or minutes! The pathologies of big data are primarily those of analysis…” (Jacobs, 2009);

Point 3.2.2. “The latest advances of information technology (IT) make it more easily to generate data. … Therefore, we are confronted with the main challenge of collecting and integrating massive data from widely distributed data sources…” (Chen, Mao and Liu, 2014);

Point 3.2.3. “Presently, Hadoop is widely used in big data applications in the industry, e.g., spam filtering, network searching, clickstream analysis, and social recommendation…” (Chen, Mao and Liu, 2014);

Point 3.2.4. “In the big data paradigm, the data center not only is a platform for concentrated storage of data, but also undertakes more responsibilities, such as acquiring data, managing data, organizing data, and leveraging the data values and functions.” (Chen, Mao and Liu, 2014);

Point 3.2.5. “Data collection is to utilize special data collection techniques to acquire raw data from a specific data generation environment. Four common data collection methods are shown as follows. – Log files….– Sensing…– Methods for acquiring network data -Libpcap-based packet capture technology..” (Chen, Mao and Liu, 2014);

Point 3.2.6. “Large data sets from Internet sources are often unreliable, prone to outages and losses, and these errors and gaps are magnified when multiple data sets are used together…” (Boyd and Crawford, 2012);

Point 3.2.7. “The first challenge brought about by big data is how to develop a large scale distributed storage system for efficiently data processing and analysis…” (Chen, Mao and Liu, 2014);

Point 3.2.8. “Data can frequently be collected passively, without much effort or even awareness on the part of those being recorded. And because the cost of storage has fallen so much, it is easier to justify keeping data than discarding it,” observe Viktor Mayer-Schönberger and Kenneth Cukier…..” (Hayashi, 2014);

Point 3.2.9. “..distributed analysis of big data comes with its own set of “gotchas.” One of the major problems is nonuniform distribution of work across nodes…” (Jacobs, 2009);

Point 3.2.10. “It is estimated that the analytics-ready structured data forms only a small subset of big data. The unstructured data, especially data in video format, is the largest component of big data that is only partially archived” (Gandomi and Haider, 2015);

Point 3.2.11. “Traditional data management systems are not capable of handling huge data feeds instantaneously. This is where big data technologies come into play. They enable firms to create real-time intelligence from high volumes of ‘perish-able’ data” (Gandomi and Haider, 2015);

Point 3.2.12. “….. Data is not only becoming more available but also more understandable to computers…” (Lohr, 2012);

Theme 3.3: The data analysis-related

Point 3.3.1. “..Big Data introduces two new popular types of social networks derived from data traces: ‘articulated networks’ and ‘behavioral networks’…” (Boyd and Crawford, 2012);

Point 3.3.2. “An anthropologist working for Facebook or a sociologist working for Google will have access to data that the rest of the scholarly community will not’. Some companies restrict access to their data entirely; others sell the privilege of access for a fee; and others offer small data sets to university-based researchers. This produces considerable unevenness in the system…” (Boyd and Crawford, 2012);

Point 3.3.3. “….Understanding networks and network formation is a core topic in complexity research and its underlying sociological and social-psychological processes should receive more attention in the analysis of Big Data…” (Raine and Wellman, 2012);

Point 3.3.4. “Too often, Big Data enables the practice of apophenia: seeing patterns where none actually exist, simply because enormous quantities of data can offer connections that radiate in all directions….” (Boyd and Crawford, 2012);

Point 3.3.5. “researchers have the tools and the access, while social media users as a whole do not. Their data were created in highly context-sensitive spaces, and it is entirely possible that some users would not give permission for their data to be used elsewhere…” (Boyd and Crawford, 2012);

Point 3.3.6. “..….Because large data sets can be modeled, data are often reduced to what can fit into a mathematical model. Yet, taken out of context, data lose meaning and value..” (Boyd and Crawford, 2012);

Point 3.3.7. “As Gitelman (2011) observes, data need to be imagined as data in the first instance, and this process of the imagination of data entails an interpretative base: ‘every discipline and disciplinary institution has its own norms and standards for the imagination of data’…” (Boyd and Crawford, 2012);

Point 3.3.8. “..…Big Data and whole data are also not the same. Without taking into account the sample of a data set, the size of the data set is meaningless…..” (Boyd and Crawford, 2012);

Point 3.3.9. “in order to enable effective data analysis, we shall pre-process data under any circumstances to integrate the data from different sources..” (Chen, Mao and Liu, 2014);

Point 3.3.10. “the following techniques represent a relevant subset of the tools available for big data analytics. .. Text analytics …. Audio analytics …. Video analytics….. Social media analytics…. Predictive analytics” (Gandomi and Haider, 2015);

Point 3.3.11. “..we can divide data analysis research into six key technical fields, i.e., structured data analysis, text data analysis, web data analysis, multimedia data analysis, network data analysis, and mobile data analysis…” (Chen, Mao and Liu, 2014);

Point 3.3.12. “The overall process of extracting insights from big data can be broken down into five stages ….. These five stages form the two main sub-processes: data management and analytics” (Gandomi and Haider, 2015);

Point 3.3.13. “If you have a random way of showing people different things on your website, then you can pretty quickly, with a very small number of observations, start to figure out what’s working and what isn’t. In real time, you can begin to refine your presentation…” (Ransbotham, 2012);

Point 3.3.14. “Research insights can be found at any level, including at very modest scales. In some cases, focusing just on a single individual can be extraordinarily valuable…” (Boyd and Crawford, 2012);

Theme 3.4: General management-related

Point 3.4.1. “Executives interested in leading a big data transition can start with two simple techniques. First, they can get in the habit of asking “What do the data say?” when faced with an important decision and following up with more-specific questions such as “Where did the data come from?,” “What kinds of analyses were conducted?,” ….. Second, they can allow themselves to be overruled by the data …” (McAfee and Brynjolfsson, 2012);

Point 3.4.2. “…Five Management Challenges… a transition to using big data….. Leadership…. Talent management. … Technology. … Decision making. …. Company culture. …” (McAfee and Brynjolfsson, 2012);

Point 3.4.3. “The more companies characterized themselves as data-driven, the better they performed on objective measures of financial and operational results…” (McAfee and Brynjolfsson, 2012);

Point 3.4.4. “..…companies that learn to take advantage of big data will use realtime information from sensors, radio frequency identification and other identifying devices to understand their business environments at a more granular level, to create new products and services, and to respond to changes in usage patterns as they occur…” (Davenport, Barth and Bean, 2012);

Point 3.4.5. “Mayer-Schönberger and Cukier explain three new imperatives: 1. Use all the data, not just a sample… 2. Accept messiness [Inaccuracies in measurements] 3. Embrace correlation” (Hayashi, 2014);

Point 3.4.6. “Big data are worthless in a vacuum. Its potential value is unlocked only when leveraged to drive decision making. To enable such evidence-based decision making, organizations need efficient processes to turn high volumes of fast-moving and diverse data into meaningful insights” (Gandomi and Haider, 2015);

Point 3.4.7. “Through research on the five core industries that represent the global economy, the McKinsey report pointed out that big data may give a full play to the economic function, improve the productivity and competitiveness of enterprises and public sectors, and create huge benefits for consumers.” (Chen, Mao and Liu, 2014);

Point 3.4.8. “we can identify big data’s key elements. First, companies can now collect data across business units and, increasingly, even from partners and customers (some of this is truly big, some more granular and complex). Second, a flexible infrastructure can integrate information and scale up effectively to meet the surge. Finally, experiments, algorithms, and analytics can make sense of all this information….” (Brown, Chui and Manyika, 2011);

Point 3.4.9. “Some literature … discuss obstacles in the development of big data applications. The key challenges are listed as follows: Data representation… Redundancy reduction and data compression… Data life cycle management… Analytical mechanism… Data confidentiality… Energy management… Expendability and scalability… Cooperation” (Chen, Mao and Liu, 2014);

Theme 4: Policy considerations on Big Data

Point 4.1. “.. major developed countries, including the US and UK, are preparing diverse policies and measures that include bolstering R&D investments and fostering experts to activate the big data industry in a bid to become competitive in the smart ecosystem environment.” (Kwon, Kwak and Kim, 2015);

Point 4.2. “Byung-Yeol et al. (2013) [Jang, 2013] emphasized the importance of creating convergence services based on big data.” (Kwon, Kwak and Kim, 2015);

Point 4.3. “Kyu-nam (2014) [Kim, 2014] insisted that, in order for the big data industry to generate value as a future growth engine, we should establish a structural framework based on social consensus beforehand” (Kwon, Kwak and Kim, 2015);

Point 4.4. “… Becoming data scientists requires the convergence of various educational fields such as mathematics, science, statistics, IT, and business.” (Kwon, Kwak and Kim, 2015);

Point 4.5. “..the use of big data and predictive analytics raises a number of difficult issues. One very hot topic is privacy concerns. In 2012, Target ignited a media firestorm after consumers learned that the company was using its quantitative methods to predict which customers were pregnant” (Hayashi, 2014);

Referring to Figure 1, there are four main themes, namely, “Definitions and characteristics of Big Data” (theme 1), “Business and technological trends related to Big Data” (theme 2), “Management practices and challenges related to Big Data” (theme 3), and “Policy considerations on Big Data” (theme 4). Themes 2 and 3 have sub-themes. Each of the themes has a set of associated points (i.e., idea, viewpoints, concepts and findings). Together they provide an organized way to comprehend the knowledge structure of the Big Data theme. The referencing indicated on the points identified informs the readers where to find the academic articles to learn more about the details on these points. The process of conducting the thematic analysis is an exploratory as well as synthetic learning endeavour on the literature. Now that the structure of the themes, sub-themes and their associated points are finalized, the reviewer is in a position to move forward to step 2 of the MMBLR approach. The MMBLR approach step 2 finding, i.e., a companion mind map, is presented in the next section.

Mind mapping on the Big Data theme

By adopting the findings from the MMBLR approach step 1 on the Big Data topic, the writer constructs a companion mind map shown as Figure 1.

Referring to the mind map on Big Data, the topic label is shown right at the centre of the map as a large blob. Four main branches are attached to it, corresponding to the four themes identified in the thematic analysis. In the same vein, two branches, associated with themes 2 and 3, have sub-branches, which represent the sub-themes recognized in the thematic analysis findings (i.e., the MMBLR approach step 1). The links and ending nodes with key phrases represent the points from the thematic analysis. As a whole, the mind map renders an image of the knowledge structure on Big Data based on the thematic analysis findings, see also the Literature on big data Facebook page for additional information on Big Data. Constructing the mind map is part of the learning process on literature review on the reviewer’s part. On the whole, the mind mapping process is speedy and entertaining. The resultant mind map also serves as a useful presentation and teaching material. This mind mapping experience confirms the writer’s previous experience using on the MMBLR approach (Ho, 2016).

Concluding remarks

The MMBLR approach to study Big Data provided here is mainly for its practice illustration as its procedures have been refined via a number of its employment on an array of topics (Ho, 2016). This article does not introduce new steps nor new ideas to the approach. In this respect, the exercise reported here primarily offers some pedagogical value as well as some stimulated learning on Big Data. Nevertheless, the thematic findings and the image of the knowledge structure on Big Data in the form of a mind map should also be of academic value to those who research on this topic.

Bibliography

1. Boyd, D. and K. Crawford. 2012. “Critical questions for Big Data” Information, Communication & Society 15(5): 662-679 (DOI: 10.1080/1369118X.2012.678878).

2. Brown, B., M. Chui and J. Manyika. 2011. “Are you ready for the era of ‘big data’?” McKinsey Quarterly October: 1-12.

3. Chen, M., S. Mao and Y. Liu. 2014. “Big Data: A Survey” Mobile Nets Appl 19: 171-209.

4. Davenport, T.H., P. Barth and R. Bean. 2012. “How ‘big Data’ Is Different” MIT Sloan Management Review 54(1) Fall: 43-46.

5. Gandomi, A. and M. Haider. 2015. “Beyond the hype: Big data concepts, methods, and analytics” International Journal of Information Management 35, Elsevier: 137-144.

6. Gitelman, L. 2011. “Notes for the Upcoming Collection ‘Raw Data’ is an Oxymoron” [Online] (url address: https://files.nyu.edu/lg91/public/) (Visited at July 23, 2011).

7. Hayashi, A.M. 2014. “Thriving in a Big Data World” MIT Sloan Management Review 55(2) Winter: 35-39.

8. Ho, J.K.K. 2016. Mind mapping for literature review – a ebook, Joseph KK Ho publication folder October 7 (url address: http://josephkkho.blogspot.hk/2016/10/mind-mapping-for-literature-review-ebook.html).

9. Jacobs, A. 2009. “The Pathologies of Big Data” Communications of the ACM 52(8) August: 36-44.

10. Jang, B.Y., et al., 2013. “Big data-based converged service development policies” STEPI. Science & Technology Policy 23 (3): 4–16.

11. Kim, K.N. 2014. “Big Data 2.0 Era, Key Issues and Political Implications” Korea Information Society Development Institute.

12. Kwon, T.H., J.H. Kwak and K. Kim. 2015. “A study on the establishment of policies for the activation of a big data industry and prioritization of policies: Lessons from Korea” Technological Forecasting & Social Change 96, Elsevier: 144-152.

13. Laney, D. 2001. “3-d data management: controlling data volume, velocity and variety” META Group Research Note, February 6.

14. Literature on big data Facebook page, maintained by Joseph, K.K. Ho (url address: https://www.facebook.com/Literature-on-big-data-1780021068946904/).

15. Literature on literature review Facebook page, maintained by Joseph, K.K. Ho (url address: https://www.facebook.com/literature.literaturereview/).

16. Literature on mind mapping Facebook page, maintained by Joseph, K.K. Ho (url address: https://www.facebook.com/literature.mind.mapping/).

17. Lohr, S. 2012. “The Age of Big Data” The New York Times February 11.

18. McAfee, A. and E. Brynjolfsson. 2012. “Big Data: The Management Revolution” Harvard Business Review October: 60-68.

19. Raine, L. and B. Wellman. 2012. Networked. The new social operating system, MIT Press. Cambridge.

20. Ransbotham, S. 2012. “Why Detailed Data Is As Important As Big Data” Interviewed by Kiron, D. MIT Sloan Management Review April: 1-5.

21. Snijders, C., U. Matzat and U. Reips. 2012. “’Big Data”: Big Gaps of Knowledge in the Field of Internet Science” International Journal of Internet Science 7(1): 1-5.

22. TechAmerica Foundation’s Federal Big Data Commission. 2012. “Demystifying bigdata: A practical guide to transforming the business of Government” (url address: http://www.techamerica.org/Docs/fileManager.cfm?f=techamerica-bigdatareport-final.pdf).

Joseph KK Ho e-resources

Sunday, 9 October 2016

A mind mapping-based literature review approach to study Big Data

1 comment: