CNI Hackathon '22 Data Science Challenge AI Challenge Register FAQs
Data Science Challenge using BMTC Dataset
Contents
  1. Background
  2. Task
  3. Dataset
  4. Submission
  5. Evaluation criteria
  6. Useful links

Background

Task

Create a model to estimate the travel time, in minutes, between source-destination pairs using the provided dataset.


Dataset

We are providing the following three files in the dataset (download link ):

  1. BMTC.parquet.gzip: It contains the GPS traces of around two thousand buses.
  2. Input.csv: It contains geographical coordinates of various sources-destination pairs.
  3. GroundTruth.csv: It contains the ground truth travel times between the source-destination pairs provided in Input.csv. It is provided to help participants assess their solutions.
Following is the detailed description of the contents of these files:
BMTC.parquet.gzip:
The file contains information in five columns, described as follows:

For better understanding, following is a snapshot from the dataset:

BusID Latitude Longitude Speed Timestamp
0 150212121 13.06593 77.45269 20 2019-08-01 18:59:18
1 150212121 13.06627 77.45211 27 2019-08-01 18:59:28
2 150212121 13.06661 77.45152 24 2019-08-01 18:59:38
3 150212121 13.06697 77.45089 28 2019-08-01 18:59:48
4 150212121 13.06727 77.45035 26 2019-08-01 18:59:58
5 150218000 13.00571 77.68619 46 2019-08-01 07:22:33
6 150218000 13.00525 77.68542 35 2019-08-01 07:22:42
7 150218000 13.00504 77.68509 0 2019-08-01 07:22:51
8 150218000 13.00504 77.68509 0 2019-08-01 07:23:01
9 150218000 13.00498 77.68497 13 2019-08-01 07:23:11

Note: The devices may not record the data with same sampling intervals. The recordings may also be noisy.

Input.csv:
The file contains four columns, described as follows:

For better understanding, following is the format of a typical input file:

Source_Lat Source_Long Dest_Lat Dest_Long
0 13.067272 77.45035 13.00525 77.68542
1 13.005042 77.68509 13.06627 77.45211
2 13.065925 77.45269 13.00498 77.68497
3 13.005247 77.68542 13.06661 77.45152

GroundTruth.csv:

The file contains one column TT, i.e. the actual travel time between a source-destination pair. The value in the i-th row corresponds to the travel time between i-th source-destination pair in Input.csv.

For better understanding, following is the format of a typical ground truth file:

TT
0 1.99
1 6.21
2 7.34
3 5.20

You can use the ground truth from the dataset to check if your code is working well.


Output (Estimated Travel Time)

Your output will be the estimated travel time (ETT), in minutes, between a given source-destination pair. For each source-destination pair, you should fill this value in the ETT column of a pandas dataframe, as illustrated below:

Source_Lat Source_Long Dest_Lat Dest_Long ETT
0 13.067272 77.45035 13.00525 77.68542 2.34
1 13.005042 77.68509 13.06627 77.45211 5.51
2 13.065925 77.45269 13.00498 77.68497 3.72
3 13.005247 77.68542 13.06661 77.45152 5.13


Submission

Evaluation criteria

Useful links

Contact us
For queries, Email us at: admin@cnihackathon.in

Follow us: