The Problem:
An Indianapolis-based nonprofit that regulates student athletes receives student transcripts from high schools around the country. The transcripts contain the required information, but in thousands of different formats. As a result, employees had to enter the documents into a database manually. This was costing the company hundreds of man-hours per week. This was a time consuming and error-prone process. Other local companies were tasked with parsing the data but were ultimately unsuccessful.
The Goal: Parsing the Data
Systematically identify courses and grades across academic year regardless of transcript format. And parse the data from all transcripts into a standard format – eventually entering it into the client’s transcript system. This reduces the manual time and cost.
How We Solved it:
- RoboSource used a text extraction library to pinpoint the x and y coordinates for course name, course ID, and course grade.
- We then captured the coordinates of each data point within each document format.
- Parsing the data, we also prescrubbed it before loading it into the database. Up until now those steps had to be done manually.
Results:
- RoboSource demonstrated the ability to automate the process of identifying and capturing transcript data across thousands of formats. Phase 2 of this project will save the company hundreds of thousands of dollars annually.
- We are also starting the process of using machine learning from previous transcripts so when the algorithm recognizes the format it puts the transcript into the correct template.
- Future steps are to build API to the database and utilize Optical Character Recognition to capture information from scanned transcripts and images.
Read about other tech solutions RoboSource provides.
Recent Comments