Data Engineer - Sandton Verfied

R 540000 per annum Sandton, Gauteng Sandton, Gauteng more than 14 days ago 25-10-2016 7:43:36 AM
22-11-2016 7:43:36 AM
Job Title: Data Engineer

Location: Sandton?

Salary: R540 000.00 per annum

A Leading company in the Telecoms industry is looking for a Data engineer to join a central team tasked with consolidating and extracting value from diverse datasets from group business units. Your main responsibility will be building and maintaining an advanced analytics capability on a 500 million transaction per month scale.

You will manage and upgrade a platform for distributed processing on transactional data at scale (Apache Spark). You will take over and streamline ETL jobs that load master data into a SQL data warehouse. You will support the use of both SQL and HDFS and productionise data flows. You will advise on IaaS or PaaS in the cloud and optimize cloud resource utilization.

A solid foundation of data processing techniques as well as a drive and aptitude to identify and adopt promising new technologies is essential. Expertise of SQL and noSQL ecosystems is required.

In summary; you will work with a data scientist to maintain a big data platform that (1) scales to the transactional load, (2) enables distributed/high-performance analytical processing and predictive modelling and (3) serves a BI self-service layer with aggregated results. Current infrastructure (see below) is containerized and orchestrated with Rancher, so DevOps evangelism or cloud experience will be highly advantageous.


Key Responsibility Areas

Core Functions:

• Understand each company’s data sets
• Understand both the business and the data analysis goals
• Advise on data architecture
• Maintain analytics services (Spark, Jupyter, PySpark)
• Ensure effective infrastructure for processing large volumes of transactional data (Spark)
• Assist in the preparation analytical datasets for data mining
• Ensure security on data platforms
• ETL
• Assess scalability and roadmap technology
• Document the planning, implementation and operation of the data platform

Skills/Requirements

• Detailed, orientated and able to handle multiple tasks at one time
• Excellent written and verbal communications
• Good business acumen
• Good interpersonal skills
• Good organization skills
• High level Technical understanding
• Must have a good command of the English Language
• Incumbent must have a valid driver’s license
• Keen project management skills with an ability to interact with and motivate others to succeed on several fronts simultaneously
• Presentation skills
• Prioritizes workload and meets deadlines for a variety of marketing "deliverables"
• Problem solving skills
• Professional
• Results orientated
• Self-motivated and able to work as a member of a team
• Strong knowledge of business models

Education & Qualification

Required:

• ETL experience with SSIS, SAP Data Services, IBM, Pentaho or equivalent.
• Experience with open source big data infrastructure: Hadoop, Spark, Mongodb, Hive, HBase.
• Experience preparing dataset for consumption by BI tools and analysts.
• Comfortable with Linux command line.
• Git version control.
• Fluent with both relational algebra and MapReduce logic.
• Experience with implementing/utilizing multithreaded data processing.
• Ability to interrogate large data files on the fly.

Advantageous:

• Postgraduate Degree in computer science, physics, applied mathematics, mathematics or other computational, data-intensive field.
• Servers and networking
• Python
• Experience with BI tools
• Virtualization technologies (Docker / Vagrant)
• Microsoft Azure or other cloud platform

Recruiter: 54 Recruitment