728x90
The Data Engineering Cookbook.pdf
3.27MB
전체 91pages 분량으로, 내용이 많은 것은 아니나 data engineering 관련 지식을 정리해 놓은 책입니다.
도커(Docker)나 REST APIs 등 데이터 사이언티스트가 알아야할 내용도 많이 포함되어 있습니다. 데이터 엔지니어링 책이지만 데이터 사이언티스트도 최소 이정도 지식은 있어야 하지 않을까 (개인적으로)생각합니다.
저자의 github (github.com/andkret/Cookbook)에서 관련 코드도 확인할 수 있습니다.
내용은 다음과 같은데,
책에 모든 내용이 자세히 담겨 있다기 보다는 관련 링크가 첨부된 형식입니다.
예를 들어 data science@ Uber 링크를 클릭하면, 관련 tech blog나 링크가 첨부되어 있습니다.
많은 분들께 필요한 자료인듯하여 공유합니다:)
기회가 된다면 조금씩 정리해서 공유하도록 하겠습니다.
Contents:
- Introduction
- Basic Engineering Skills
- Advanced Engineering Skills
- Hands On Course‚
- Case Studies
- Best Practices Cloud Platforms
- 130+ Data Sources Data Science
- 1001 Interview Questions
- Recommended Books and Courses
Full Table Of Contents:
Introduction
- What is this Cookbook
- Data Engineer vs Data Scientist
- My Data Science Platform Blueprint
- Who Companies Need
Basic Engineering Skills
- Learn To Code
- Get Familiar With Git
- Agile Development
- Software Engineering Culture
- Learn how a Computer Works
- Data Network Transmission
- Security and Privacy
- Linux
- Docker
- The Cloud
- Security Zone Design
Advanced Engineering Skills
- Data Science Platform
- Hadoop Platforms
- Connect
- Buffer
- Processing Frameworks
- Lambda and Kappa Architecture
- Batch Processing
- Stream Processing
- Should You do Stream or Batch Processing
- Is ETL still relevant for Analytics?
- MapReduce
- Apache Spark
- What is the Difference to MapReduce?
- How Spark Fits to Hadoop
- Spark vs Hadoop
- Spark and Hadoop a Perfect Fit
- Spark on YARn
- My Simple Rule of Thumb
- Available Languages
- Spark Driver Executor and SparkContext
- Spark Batch vs Stream processing
- How Spark uses Data From Hadoop
- What are RDDs and How to Use Them
- SparkSQL How and Why to Use It
- What are Dataframes and How to Use Them
- Machine Learning on Spark (TensorFlow)
- MLlib
- Spark Setup
- Spark Resource Management
- AWS Lambda
- Apache Flink
- Elasticsearch
- Apache Drill
- StreamSets
- Store
- Visualize
- Machine Learning
- How to do Machine Learning in production
- Why machine learning in production is harder then you think
- Models Do Not Work Forever
- Where are The Platforms That Support Machine Learning
- Training Parameter Management
- How to Convince People That Machine Learning Works
- No Rules No Physical Models
- You Have The Data. Use It!
- Data is Stronger Than Opinions
- AWS Sagemaker
Hands On Course
- What We Want To Do
- Thoughts On Choosing A Development Environment
- A Look Into the Twitter API
- Ingesting Tweets with Apache Nifi
- Writing from Nifi to Apache Kafka
- Apache Zeppelin Data Processing
- Switch Processing from Zeppelin to Spark
Case Studies
- Data Science @Airbnb
- Data Science @Amazon
- Data Science @Baidu
- Data Science @Blackrock
- Data Science @BMW
- Data Science @Booking.com
- Data Science @CERN
- Data Science @Disney
- Data Science @DLR
- Data Science @Drivetribe
- Data Science @Dropbox
- Data Science @Ebay
- Data Science @Expedia
- Data Science @Facebook
- Data Science @Google
- Data Science @Grammarly
- Data Science @ING Fraud
- Data Science @Instagram
- Data Science @LinkedIn
- Data Science @Lyft
- Data Science @NASA
- Data Science @Netflix
- Data Science @OLX
- Data Science @OTTO
- Data Science @Paypal
- Data Science @Pinterest
- Data Science @Salesforce
- Data Science @Siemens Mindsphere
- Data Science @Slack
- Data Science @Spotify
- Data Science @Symantec
- Data Science @Tinder
- Data Science @Twitter
- Data Science @Uber
- Data Science @Upwork
- Data Science @Woot
- Data Science @Zalando
Best Practices Cloud Platforms
130+ Free Data Sources For Data Science
- General And Academic
- Content Marketing
- Crime
- Drugs
- Education
- Entertainment
- Environmental And Weather Data
- Financial And Economic Data
- Government And World
- Health
- Human Rights
- Labor And Employment Data
- Politics
- Retail
- Social
- Travel And Transportation
- Various Portals
- Source Articles and Blog Posts
- Free Data Sources Data Science
1001 Interview Questions
Recommended Books and Courses
'머신러닝 서적 (무료 e-book, review 등)' 카테고리의 다른 글
An Introduction to Statistical Learning with Applications in R 의 Second Edition!! (feat. ISLR) (0) | 2021.08.23 |
---|---|
IBM에서 만든 Machine Learning for dummies (왕초보를 위한 머신러닝) (0) | 2021.08.23 |
[무료 e-book]Learning SQL (feat. SQL 도서 추천) (0) | 2021.08.02 |
Numpy exercises 100 - 넘파이 연습문제 100개 (1) | 2021.04.15 |
Rebuilding Reliable Data Pipelines Through Modern Tools - 데이터 파이프라인 구축을 위한 심플한 책 추천! (0) | 2021.04.13 |