Big Data Tutorial for Beginners
What is Big Data?
Let us take an example of a document file (MS EXCEL 2013) stored in a computer having 1 million records in it and size is 400 GB. If a computer's capacity is 500 GB, and if there is an exponential growth of data in that Excel file, then that computer cannot store or process the data in that MS EXCEL FILE. So the data inside that MS EXCEL FILE for that computer is called as BIG DATA. If someone has a high end computer (hard disk, memory) and applications to process it, then the same file is not called as BIG DATA. So Big Data is collection of data which is so large and complex, which is difficult to process with the current infrastructure. Similarly in many organizations, they would have designed their IT infrastructure based on their past/current/future needs. Due to the trend change and exponential growth of the data, companies are not able to store and process the data. If a company has 100 TB hard disk and necessary applications to process it, then the hard disk can hold only 100 TB of data in that hard disk. If the data reaches beyond 100 TB, then it is called as BIG DATA.
The BIG DATA meaning is based on the capabilities of the IT Infrastructure to store the data and the applications to process the data of an organization.
As per www.Gartner.com the definition of Big Data:
Big data is high-volume, -velocity and -variety information assets that demand cost-effective, innovative forms of information processing for enhanced insight and decision making.
Big Data Examples:
- In a saving account 100 transactions would have been done in a year and 100 data records would have been stored in the computer. For 25 years, 2500 data records would have been stored in the computer. If data has to be retrieved, then 2500 records have to be searched. If more than 10 million users are there, think about the large volume of data. So you need an efficient IT infrastructure to store and process the data.
- Social networks like Facebook (www.facebook.com), LinkedIn (www.linkedin.com), Twitter (www.twitter.comm) etc., have been used extensively to store user’s personal data and their history (messages, resumes, photos, videos etc.).
- Several searches using search engines like Google, Yahoo, and Bing etc. have to quickly process the search request provided by the user and store the search data. They also process, store and publish online advertisements (e.g. Google Adwords) from the clients and take care of the published advertisements (e.g. Google Adsense) from the website owners across the globe.
- Emails from email applications, messages across Chat Boards, personal/official Forums, Groups (Google, Yahoo etc.), Blogs (Blogger, Wordpress, Tumblr etc.), matrimonial websites generate large volume of data every second.
- Several videos, images, and pictures are uploaded or downloaded in to Google's YOUTUBE, FLICKR etc.
- Smartphones (Apple, Samsung, Nokia, Blackberry etc.) users exchange photos, videos, clippings and VIDEO calling also. Millions of mobile applications have been developed and downloaded now.
- Now people are using websites aggressively to buy/sell products. Example www.amazon.com, www.ebay.com.
- Online stock, futures, options, trading has become popular. For each buy and sell transaction, records have to be stored.
- Government related data and research data on various domains.