The primary goal of this article is to provide the gist of how to build a platform of data science from a scratch for a startup with the help of providing real-life examples using Google cloud platform.
One of the first and foremost things to ask at the time of hiring for your startup is that how will data science improve our product? Moreover, our product is data, and with the goal of data, science aligns with the corporate goal of the company.
Some of the benefits of Data Science at Startup are:
- Building the data products that will enable the new features of the product
- Running different experiments to test product changes
- Identifying major business metrics to forecast and track.
- Building the perspective models on the basis of customer behavior
Tracking of data: Knows how to capture the things from different applications and web pages, proposes different ways of collecting and maintaining data, introduces issues such as fraud and privacy.
Data pipelines: Shows different approaches for collecting the data for the use of data science team and analytics, discuss different approaches to connection with the database, data lakes, and flat files, and present an application using the BigQuery, DataFlow, PubSub which also includes the evolution of game analytics platforms and scalable analytical pipeline.
Business Intelligence: Business Intelligence identifies common practices for automated reports, KPIs and run the business metrics and ETLs.
Exploratory Analysis: It covers all the common analyses which is used for digging into data such as building histograms and cumulative distribution functions, feature importance for linear models and correlation analysis.
Predictive modeling: Discuss different forms of approaches with both the supervised and unsupervised learning and presents cross promotion models, and methods for evaluating the offline performance of the model.
Model production: Scale up different models to score millions of records, and discusses different approaches for model deployment. Similar things which include productive models with DataFlow and Producing DataScience at Twitch.
Experimentation: It provides an introduction to testing with A/B model, discuss how to set up an experimental framework for running test cases, and presents an example analysis with bootstrapping. It also includes A/B testing with staged rollouts.
Recommendation system: it introduces the basis of introduction systems and provides an example for a production system of a scaling up a recommender. It also includes prototyping a recommender.
Deep learning: It provides an introduction to the problems of data science that are more matched with deep learning, such as flagging Instant messages as offensive or abusive. It also provides a different set of examples of prototyping models with the R interface to keras and connecting the R interface to CloudML.