Analysing uber data and selecting popular destination using big data

ABSTRACT:
This paper presents an evaluation of the critical factors influencing choice of destinations this study focuses on the internal factors that motivate tourists to choose their preference of destination. This study can assist relevant authorities and travel agencies to plan and promote the places of attraction with effective marketing strategies besides assisting tourists to decide where to go to main attractions. an attempt has been made to do comparative study of  taxi aggregators that have radically changed the way "the great Indian middle class" commutes daily- UBER. Currently, UBER cabs are following the strategy of expanding their operations and building customer base in key metropolitan cities across India. The motive is to increase market share and achieve economies of scale and at the same time providing customer satisfaction. This article seeks to understand the dynamics of India s taxi market by studying various factors like the pricing, market share, revenue models, etc. The Project is qualitative in nature and based on secondary data collected from different sources.
EXISTING SYSTEM:
   
Customers are often dissatisfied with traditional cab companies because of their high prices and long waiting time and hence can exploit new and big markets in countries like India.  Can tap growing markets in suburban areas where taxi services are not available. 
Unlimited fleet of vehicles available. Regular Taxi service regulations are not applicable for Uber. 
One key strategy of Uber regarding charging for fake bookings me delayed cancellation has levied against Ola in India. Now comes the deep pocketed backers that Ola has been able to excel in.

One documented means of reducing these overheads is to focus the routing optimizations on the paths to popular destinations, and thus be able to shift a large volume of traffic with a small number of path switches, rather than shifting the traffic for all the prefixes

we track traffic with three distinct predictors which vary in degrees of complexity: a very simple predictor, the Last Value (LV), the classical Moving Average (MA) , and an adaptive but more complex predictor, the LpEMA (Low pass Exponential Moving Average),
s. This implies that it is enough to take account of only a small fraction of the total number of destinations to control the routing of the majority of the traffic.
In view of this, we drew up a practical criterion for the selection of popular destinations.
This section sets out by describing the traffic engineering process. We then underline the importance of the consistency of traffic demands with the Zipf’s law for the traffic engineering. Lastly, we describe the problem of selecting the popular destinations, and present our proposal.

DISADVANTAGES

However, there is a lack of simple and pragmatic methods for selecting popular destinations
Low-profit margins causes dissatisfaction among the drivers. This might lead to bad publicity, which can in turn discourage the new drivers from joining Uber. 
Uber and its customers have no bonding. Incentive remaining with Uber is low. 


PROPOSED SYSTEM:

We proposed that we will find the days on which each basement has more trips. 
Customers are often dissatisfied with traditional cab companies because of their high prices and long waiting time and hence can exploit new and big markets in countries like India. 
Find the days on which each basement has more number of active vehicles. 
Can tap growing markets in suburban areas where taxi services are not available. 
Estimated Time of Arrival can be reduced with rise in the number of Uber drivers which in turn will make Uber more liked by the customers and hence, the startup will get more revenue and drivers will also be profited. 

Based on the data, we will find the top 20 destination people travel the most, top 20 locations from where people travel the most, top 20 cities that generate high airline revenues for travel, based on booked trip count.

Top 20 destination people travel the most: Based on the given data, we can find the most popular destination that people travel frequently
There are many destinations out of which we will find only first 20, based on trips booked for particular destinations
We are creating an RDD by loading a new dataset which is in HDFS.
We have split each record by taking the delimiter as tab because the data is tab separated. We are creating the key-value pair, where key is the destination that is in 3rd column and the value is 1. 
Since we need to count the cities which are popular, we are using the reduceByKey method to count them.
After counting the destinations, we are swapping the key-value pairs. The sortByKey method sorts the data with keys and false stands for descending order. Once the sorting is complete, we are considering the top 20 destinations.
We can find the places from where most of the trips are undertaken, based on the booked trip count.

ADVANTAGE :

Estimated Time of Arrival can be reduced with rise in the number of Uber drivers which in turn will make Uber more liked by the customers and hence, the startup will get more revenue and drivers will also be profited. 
Convenient system for the drivers. They can work for flexible hours and can even choose to be a part-time employee. Drivers can also reject unwanted clients. 
There are many destinations out of which we will find only first 20, based on trips booked for particular destinations.
We are using the sortByKey method which sorts the data with keys where false stands for descending order. 

SYSTEM ARCHITECTURE:

SYSTEM REQUIREMENTS:
HARDWARE REQUIREMENTS:

System : INTEL I3
Hard Disk            : 500 GB.
Mouse : Logitech.
Ram : 4GB.
Operating system             :          64-bit.


SOFTWARE REQUIREMENTS:

Operating system : Linux.
Coding Language : Java, spark
Database : HDFS
TOOL                    :         Map-reduce  , KNN            


REFERENCES

Chaudhuri, S., Dayal, U., and Narasayya, V. 2011. “An Overview of Business Intelligence     Technology,” Communications of the ACM (54:8), pp. 88-98.