Standardising citizen data to facilitate Aadhaar integration

Project Category: Corporate

Sub Category: Best Government to Government (G2G) Initiative of the Year
Reference No: eMaharashtra2013/awards/152

Details of the Applicant:

Name: Kapil Vaidya
Designation: Business Manager
Organization: SAS Institute (India)
Address : 4th Floor, Apeejay House, 3 Dinshaw Wachha Road
City: Mumbai
State: Maharashtra
Country:India
Zip Code: 400020
Organization Website:  www.sas.com/india

Brief Description:
The UID Innovation Center, set-up by Government of Maharashtra, spearheads several initiatives for data quality, cleansing and integration to make Aadhaar the basis for identity across systems. The UID Innovation center works with names of 52 million residents of Maharashtra state. Dedicated teams from Mahaonline, Microsoft and SAS have been involved in undertaking data mining, analysis and development of customised utilities and solutions for making Aadhaar integration a success. Under the guidance and leadership of Secretary IT and consultancy services provided by Ernst & Young and Accenture, the UID Innovation centre is pioneering the integration of Aadhaar across existing systems. SAS is amongst the key solution that has empowered The UID Innovation Center in efficiently handling large volumes of residents’ data, standardising State Resident Data Hub (SRDH), data matching, performing automated seeding (integration of Aadhar data with other data sources like election, scholarships, etc.), data de-duplication, identification of ghost entries and creating a unified view of citizens’ data. SAS Data Management Solutions enabled The UID Innovation Center by increasing efficiency, reducing turnaround times and providing measurable value.

Objective:
‘Aadhaar’ communicates the fundamental role of Unique Identity Number as a universally accepted identifier for interacting and transacting with each resident of India. Traditionally, the Name of an individual has been among a set of attributes used to perform this function. Hence, it is critically important to have a context-rich understanding of patterns in name data from eGovernance systems to be able to integrate and transition the traditional methods to Aadhaar based systems. The UID Innovation Center, set-up by Government of Maharashtra, aimed at enhancing data quality, spotting duplicates & ghost entries and automated seeding of the extremely large volumes of citizen data for the State of Maharashtra.

Target Group: Government of Maharashtra

Geographical Reach within India: Maharashtra

Geographical Reach outside India: N/A

Date From which the Project became Operational: 2-1-2013

Is the Project still operational?: Yes

List 5 achievements of the programme/project/initiative:
1. Transforming the large dataset challenge into an operational advantage: SAS matched 42 million records of UID with 70 million records of state election data and populated them with the UID numbers. This process is called automated seeding. With the help of automated seeding, Department of IT achieved lower turnaround times and effort minimisation as compared to the manual exercise which were conducted in the past.
2. Data enrichment: There were several errors and inconsistencies in the data especially in the names of the individuals. Hence, a set of rules were developed to clean and simplify the names in the SRDH. These set rules keep evolving through the outputs of processes. This further led to standardisation of data, which helps in providing correct indicators used for formulating data driven decision making and policy formation.
3.Data De-duplication: Duplicate records in the database were investigated and removed before further processing for Aadhaar integration could be performed. This empowered the DIT with a unified single-view of residents’ data. To put this into perspective, 11.81 lakh duplicate records were found and fixed in the election database.
4. Identification of ghost entries: Ghost entries were identified and flagged-off to the DIT. This gives the DIT the power to identify bogus entries and ensure that the benefits are passed on to the deserving citizens.
5. Creation of reusable components: Re-usable components and solutions were created which can now be leveraged for overcoming similar challenges that would be faced by other states and private agencies. The following re-usable components and solutions developed by The UID Innovation Centre would be made available on specific request made to the Directorate of Information Technology, Government of Maharashtra: – Algorithm / code for verification of UID numbers in eGovernance systems and beneficiary

List 5 Key challenges faced while implementing the project/programme/initiative and how they were overcome:
1.Large dataset: The data consisted of a whopping 52 million names of residents. To add to the problem, the names were in two languages – English and Marathi.
2.Data Standardisation: The SRDH data was not standardised and had several data quality issues. Some of the key issues included the following: – The address fields in the SRDH data were not standardised. – There were errors in several data fields. – Marathi Language fields were not standardised and transliteration was not effective
3.Duplicate entries: There were phenomenal amounts of duplicate entries in the database. The duplications were majorly on the name and address of the residents.
4.Ghost entries: Ghost entries refer to the bogus entries in the database. At the start of the project, the database consisted of a lot of such entries. Manually identifying these entries was a major task. On the contrary, such entries cannot be overlooked because they can act as a major barrier in ensuring that the benefits are passed-on to the citizens. For example, in the list of junior college students availing benefit under scholarship schemes, KYR+ verification may show results where actual
5.Lack of standard templates, processes and reusable components: This project was one of the first of its kind for the Government of Maharashtra, where the department aimed at providing unique identification to each citizen. Hence, there was no reference or reusable component to solve this major data management issue.