Designed and built a consolidated enterprise data lake on AWS EMR and Apache Iceberg for a major research university — turning complex on-premises Workday and PeopleSoft data into queryable, relational tables through an auto-relationalization engine.
- Designed and built an auto-relationalization engine in Python and Spark that converts complex Workday and PeopleSoft XML into queryable Iceberg tables — automatically, end-to-end.
- Replaced Airflow with an event-driven Step Functions / EventBridge / Lambda architecture for the Student Data Warehouse — dynamic DAG generation, no orchestration server to maintain.