Data Engineer (JHB)
- Johannesburg, Gauteng
- Permanent
- Full-time
- Design, develop, and maintain scalable and efficient data pipelines and ETL processes to ingest, transform, and load data from various sources.
- Develop and implement data warehousing solutions.
- Collaborate with cross-functional teams to integrate various data sources.
- Ensure data quality and consistency across different data systems.
- Optimise data retrieval for dashboard/reporting solutions.
- Optimise data infrastructure, including data storage, data retrieval, and data processing for enhanced performance and scalability.
- Implement data quality and data governance processes to ensure accuracy, consistency, and integrity of data.
- Monitor and troubleshoot data pipelines to identify and resolve issues in a timely manner.
- Perform data profiling and analysis to identify data quality issues and propose improvements.
- Collaborate with data scientists and analysts to provide them with the necessary data sets for analysis and reporting.
- Stay up to date with emerging technologies and trends in data engineering and recommend new tools and frameworks to improve data infrastructure.
- Willingness to actively contribute to BI analytics tasks, such as creating and maintaining reports.
- This means being comfortable with hands-on work, including report development, SQL writing, and refactoring tasks.
- Preferably a bachelor's degree in computer science, engineering, or a related field.
- Certifications (AWS, GCP, Azure, Microsoft) a plus.
- Proven experience as a Data Engineer or similar role, with a strong understanding of data modelling, data warehousing, and ETL processes.
- Proficient in SQL and experience working with relational databases (e.g., PostgreSQL, MySQL, SQL Server) and NoSQL databases (e.g., MongoDB, Cassandra).
- Strong programming skills in at least one scripting language (e.g., Python) and experience with data manipulation and transformation libraries (e.g., Pandas, PySpark).
- Comfortable working with cloud-based infrastructure and services provided by Amazon Web Services (AWS)/ Azure.
- Familiarity with data pipeline orchestration tools (e.g., Apache Airflow, Luigi, Lambda) and workflow management systems.
- Experience with real-time data streaming technologies (e.g., Apache Kafka, Apache Flink).
- Knowledge of containerisation technologies and orchestration tools (e.g., Docker, Kubernetes).
- Familiarity with machine learning concepts and frameworks.