data engineer for multiple energy companies, 2018 - present.
streaming data from 8 subreddits using the reddit api, kafka, pyspark and delta tables on s3 data lake transformed by glue jobs & queried w/ athena. ·...
api ingestion for multiple subreddit on spark cluster. python kafka producer. · https:/github.com/stevenhurwitt/reddit-streaming reddit streaming pyspark...