The incumbent will join a central team tasked with consolidating and extracting value froim diverse datasets from group business units. The incumbents main responsibility will be building and maintaining an advanced analytics capability on a 100 million transactions per month scale
The incumbent? will: implement and manage a platform for distributed processing on transactional data (Hadoop & Spark); will take over and streamline ELT jobs that load master data into a SQL data warehouse; will support the use of both SQL & HDFS and productionise data flows; will advise on laaS or PaaS in the cloud and optimise cloud resource utilisation.
Education & Qualifications required:
- ETL experience with SSIS, SAP Data Services, IBM, Pentaho or equivalent.
- Open Source Big Data infrastructure: Hadoop, Spark, Mongodb, Hive, HBase
- Experience preparing dataset for consumption by BI tools and analysts
- Comfortable with Linux command line
- Git version Control
- Fluent with both relational algebra and MapReduce logic
- Experience with implementing/utilising multithreaded data processing
- Ability to intergrate large data files on the fly
Advantageous:
- Postgraduate Degree in computer science, physics, applied mathematics, mathematics or other computational, data intensive field
- Servers and networking
- Python
- Experience with BI Tools
- Docker/Vagrant
- Microsoft Azure or other cloud platform