Hide

Sign Up

Get our monthly newsletter in your inbox.

Oops! Something went wrong while submitting the form.
see all feed
see all Podcasts
see all Elements
{feed}

When Inference Meets Engineering

October 24, 2022
Written By
October 24, 2022
Written By

Othmane Rifki, Principal Applied Scientist at super{set} company Spectrum Labs, reports from the session he led at super{summit} 2022: "When Inference Meets Engineering." Using super{set} companies as examples, Othmane reveals the 3 ways that data science can benefit from engineering workflows to deliver business value.

By Othmane Rifki, Principal Applied Scientist at Spectrum Labs

super{set} companies have been very early to adopt data engineering, or leveraging software engineering to power data science workflows and solve business problems. Many of us have been on the front lines of the emerging roles that sit between an organization’s traditional software engineers and data scientists, namely the Data Engineers that optimize the retrieval and use of data to power ML models and the Machine Learning Engineers that ensure a scalable and flexible environment for ML model pipelines.

At super{summit} 2022, I organized a session for data and ML engineers to come together and share lessons learned from their particular workflow and business situation. Our conversation distilled 3 ways that data science can benefit from engineering workflows to deliver business value:

  1. Managing the complexity of the machine learning lifecycle at scale
  2. Creating business value means seeing models through to deployment and beyond
  3. Preserving data privacy to build trust with consumers

Let’s review!

Managing Machines Learning Lifecycles at Scale

Data science teams focus on building models to help businesses solve problems - for example, identifying hate speech with deep learning models (see Spectrum Labs). The performance of those models is assessed with labeled datasets originating from client traffic or other sources. All this is quite manageable at a small scale when there are only a handful of models to serve and customers that can be counted with one hand. 

When data science models scale, things start to break. The average super{set} deals with a large number of models that need to be deployed on behalf of multiple clients in a myriad of production environments. Understanding and managing these models and their dependencies at scale while also mitigating risks that may arise from decision automation (decision-making without human intervention) becomes critical to the success of business operations. Simply put: dollars and livelihoods are on the line as startups scale into meaningful businesses.

Data + ML engineering to the rescue! Data engineers and ML engineers work together to::

  • Optimize the retrieval of data needed to train models
  • Integrate machine learning models into an organization’s applications and systems
  • Ensure a scalable and flexible machine learning model pipeline from design to serving to monitoring
  • Build robust automation to ease the continuous delivery of model updates while maintaining high quality

Introducing MLOps

super{set} has more than just in-house expertise to manage lifecycles at scale (what we call Machine Learning Operations or “MLOps”): one super{set} company, MarkovML, is entirely dedicated to solving the problem of MLOps! MarkovML connects ML teams and provides end-to-end visibility into their ML workflows so teams can reach their business goals faster. MarkovML connects ML teams and provides end-to-end visibility into their ML workflows so organizations can reach their business goals faster. As the team from MarkovML shared, MLOps is more than just streamlining the process of deploying, monitoring, and maintaining ML models - it’s about improving the entire lifecycle by providing valuable insights around:

  • The performance of the model
  • The relevance of the data used for training
  • Connecting performance and relevance to the target business value

Once again, it all comes down to solving business problems. The burden and error-prone manual processes of keeping track of the organization’s data and models can be eliminated in favor of automating the end-to-end machine learning workflow enabling data science teams to focus on extracting insights related to business objectives. Products of MarkovML make it simpler.


Data Governance & Model Measurement WorkFlow at markovML

Creating Business Value Means Seeing Models Through to Deployment and Beyond

Creating business value doesn’t stop with model creation. Each super{set} company made clear that the smooth deployment of new models into production is key to maximizing the value of the product offering. Ketch, a company that enables organizations to build trust with their consumers via privacy controls and governance for data, shared the importance of ensuring that models developed in isolation in a dev environment are prepared for the production environment. For instance, when a model is developed using python libraries and production is based on a Java runtime environment, conversion is required. 

Data scientists can be well-served by using a model format such as ONNX, which is an open format built specifically to represent machine learning models. Look for model formats that are widely used, have built-in optimizations, and support a variety of machine learning frameworks, operating systems, and hardware platforms.

Post-Deployment Testing Strategies

Deploying models into production is far from the final step in providing business value. A deployed model can start degrading in quality since a static model cannot keep up with new trends - the reality of life is that change is the only constant. My company, Spectrum Labs, is dedicated to protecting users from disruptive behaviors and promoting healthy exchange via positive behaviors. We run sanity checks on our models prior to deployment and monitor the performance to track any potential degradation which may trigger a retraining of the model with more representative data.

Machine learning project lifecycle

There are two approaches we at Spectrum Labs take to evaluate and monitor model performance post-deployment:

  1. Regression tests via ground truth evaluation. These tests pull in data from live traffic through the following deployment process:
  • Before deployment, a new model is first ensured to pass a set of carefully curated data that previous models passed. 
  • After deployment and some time later, new data from live traffic is pulled and labeled to obtain ground truth that can be used to make sure the model is not degrading as compared to registered metrics in the training phase.
  1. Smoke tests via drift detection, where data distributions are monitored to make sure that they don’t diverge in a statistically significant way from the training and testing phases on one side and the development phase on the other.

Sometimes, this feedback loop from the production environment back to prototyping and development is not simply about quality assurance - it is central to the product value proposition itself and how it solves a business problem.  For instance, Sturdy unifies customer feedback from a variety of sources into one channel, after anonymizing and reacting PII, and uses machine learning to identify signals in the data that impact revenue retention. Through automation, the signals enable Sturdy’s customers to drive critical business processes and to act on customer feedback as soon as data is received. 

Sturdy gets all of your customer conversations and feedback in one single dashboard.

Preserving Data Privacy to Build Trust with Consumers

Models depend on data. The quality of the data used has the biggest impact on the performance of a model. In many cases, the business value is derived from data that originates from people. Data scientists, software engineers, data engineers and ML engineers - we’re not the true owners of this data, just custodians of data that is truly owned by others. In these cases, the management of data and its privacy requires a set of controls to ensure that organizations deliver on their responsibilities to stakeholders viewed broadly. 

Of chief concern is understanding:

  • The provenance of data used in training
  • How data was collected
  • How data was treated for bias

Beyond the ethical use of data is the secure use of data: data must be handled off of desktops and managed in a secure and traceable manner with all personal information strictly removed. Conveniently, super{set} once again has in-house expertise: Ketch offers an infrastructure for data privacy, compliance, and security, and Habu offers a secure data collaboration platform (“data clean room”) with comprehensive analytics. 

Habu’s clean room software allows for privacy-safe data collaboration between multiple clients’ first party data to obtain valuable insights from aggregated data outputs. 

Data and ML engineering is an emerging field. As data scientists, software engineers, data engineers and ML engineers, it’s always helpful to compare notes and get up-to-speed on best practices friends in other organizations are applying to their products. I will say this: only in super{set} will you get a community of data practitioners that are not just leveraging data to solve business models, but are also building businesses to solve data problems!

Transcript

Hide

Get our monthly newsletter in your inbox.

Oops! Something went wrong while submitting the form.
Written By
Written By
Read next

Introduction

In the first episode of The Closed Session, meet Tom Chavez and Vivek Vaidya, serial entrepreneurs and podcast hosts.

read more

Starting From Scratch

In the second episode of The Closed Session, Tom and Vivek discuss the framework for starting your own company from scratch, and the three dimensions that should be taken into account.

read more

The Business Plan

You’ve decided to launch a business, but before you hurtle blindly into the breach, you need a bulletproof plan and a perfect pitch deck to persuade your co-founders, investors, partners, and employees to follow you into the unknown.

read more

Early-Stage Funding Do’s and Dont’s

In this episode of The Closed Session, Tom and Vivek talk about dilution, methods, mindset, benchmarks and best practices for raising investment capital for a new tech startup.

read more

Early Team Formation

Now that you've written the business plan and raised money, it's time to recruit your early team. In this episode, Tom and Vivek cover the do's and dont's of building a high-output team - who to hire, how to build chemistry and throughput, how to think about talent when your company is a toddler versus when it's an adolescent.

read more

Creating a Winning Culture: Must-Haves, Memes, and Tips

read more

Building a Kickass Product & Technology Engine

read more

Women in Tech

read more

How to Interview for a Startup

read more

Is Tech Stingy? The Case for Doing Well *and* Doing Good

read more

And, we’re live at super{set}!

Welcome to Season 2 of The Closed Session! In this first episode of 2020, Tom and Vivek talk about the five companies super{set} launched in 2019 and the lessons they’re learning as they go.

read more

Equity and Inclusion

Tom and Vivek talk about inclusion and reflect on their personal experiences as brown guys in tech. Inclusion feels like a moral imperative, but does it really make for stronger, better companies? Are there unintended consequences of acting on good intentions to 'fix' an inclusion problem at a company? Why is tech so lacking in diversity, and what can we do to get it right?

read more

super{set}’s Spectrum Detoxifies The Online Space

We are living in a time of extraordinary concern about the negative consequences of online platforms and social media. We worry about the damage interactive technologies cause to society; about the impact to our mental health; and about the way that these platforms and their practices play to our most destructive impulses. Too often, the experiences we have online serve only to polarize, divide, and amplify the worst of human nature.

read more

From Watsonville To The Moon

This post was written by Habu software engineer, Martín Vargas-Vega, as part of our new #PassTheMic series.

read more

Not Just On Veterans Day

This post was written by Ketch Developer Advocate, Ryan Overton, as part of our #PassTheMic series.

read more

Thick Skin, Tech and Black History Month

This post was written by Ketch Data Privacy & Compliance Specialist, Jocelyn Brunson, as part of our #PassTheMic series.

read more

The Balancing Act For Women in Tech

This post was written by Ketch Sales Director, Sheridan Rice, as part of our #PassTheMic series.

read more

The Studio Model

What’s a startup studio? Is it just “venture capital” with another name?

read more

We don’t critique, we found and build.

The super{set} studio model for early-stage venture It is still early days for the startup studio model. We know this because at super{set} we still get questions from experienced operators and investors. One investor that we’ve known for years recently asked us: “you have a fund — aren’t you just a venture capital firm with a different label?”

read more

Silicon Valley’s Greatest Untapped Resource: Moms

This post was written by MarkovML Co-Founder, Lindsey Meyl, as part of our #PassTheMic series.

read more

New Venture Ideation

Where do the ideas come from? How do we build companies from scratch at super{set}?

read more

Good Ideas, Good Luck

Coming up with new company ideas is easy: we take the day off, go to the park, and let the thoughts arrive like butterflies. Maybe we grab a coconut from that guy for a little buzz. While this describes a pleasant day in San Francisco, it couldn’t be further from the truth of what we do at super{set}. If only we could pull great ideas out of thin air. Unfortunately, it just doesn’t work that way.

read more

Data Eats the World

The wheel. Electricity. The automobile. These are technologies that had a disproportionate impact on the merits of their first practical use-case; but beyond that, because they enabled so much in terms of subsequent innovation, economic historians call them “general-purpose technologies” or GPTs...

read more

The Four Types of Startup Opportunities

In our last post, we discussed how data is the new general-purpose technology and that is why at super{set} we form data-driven companies from scratch. But new technologies are a promise, not a sudden phase change.

read more

VCs Write Investment Memos, We Write Solution Memos

When a VC decides to invest in a company, they write up a document called the “Investment Memo” to convince their partners that the decision is sound. This document is a thorough analysis of the startup...

read more

Lessons of Grit from my Immigrant Parents

This post was written by Ketch Solutions Engineer, Sahiti Surapaneni, as part of our #PassTheMic series.

read more

People, First

What does it mean to be a super{set} co-founder and who do we look for? Why is the Head of Product the first co-founder we bring on board?

read more

Navigating Juneteenth

Considered by some to be “America’s Second Independence Day,” Juneteenth has only recently entered the national zeitgeist. Celebrated on the third Saturday in June, it became a federal holiday just last year under President Joe Biden. Many companies are left wondering how to acknowledge the holiday. We sat down with Eskalera’s co-founder Dr. Tolonda Tolbert to get her take.

read more

The super{set} Entrepreneurial Guild

Has someone looking to make a key hire ever told you that they are after “coachability”? Take a look at the Google ngram for “coachability” — off like a rocket ship since the Dot Com bubble, and it’s not even a real word! Coaching is everywhere in Silicon Valley...

read more

Why Head of Product is Our First Co-Founder

At super{set}, we stand side-by-side and pick up the shovel with our co-founders. Our first outside co-founder at a super{set} company is usually a Head of Product. Let’s unpack each portion of that title....

read more

Why I'm Co-founding @ super{set}

Pankaj Rajan, co-founder at MarkovML, describes his Big Tech and startup experience and his journey to starting a company at super{set}.

read more

Too Dumb to Quit

The decision to start a company – or to join an early stage one – is an act of the gut. On good days, I see it as a quasi-spiritual commitment. On bad days, I see it as sheer irrationality. Whichever it is, you’ll be happier if you acknowledge and calmly accept the lunacy of it all...

read more

The Product Heist

Tom and Vivek describe how building the best product is like planning the perfect heist: just like Danny Ocean, spend the time upfront to blueprint and stage, get into the casino with the insertion product, then drill into the safe and make your escape with the perfect product roadmap.

read more

Founder and Father: A Balancing Act

Making It Work With Young Kids & Young Companies

read more

Early Stage Customers

Tom and Vivek discuss what the very first customers of a startup must look and act like, the staging and sequencing of setting up a sales operation with a feedback loop to product, and end with special guest Matt Kilmartin, CEO of Habu and former Chief Revenue Officer (CRO) of Krux, for his advice on effective entrepreneurial selling.

read more

Overheard @ super{summit}

Vivek Vaidya's takeaways from the inaugural super{summit}

read more

How I Learned to Stop Optimizing and Love the Startup Ride

Reflections after a summer as an engineering intern at super{set}

read more

Why I Left Google To Co-found with super{set}

Gal Vered of Checksum explains his rationale for leaving Google to co-found a super{set} company.

read more

The Era of Easy $ Is Over

The era of easy money - or at least, easy returns for VCs - is over. Tom Chavez is calling for VCs to show up in-person at August board meetings, get off the sidelines, and start adding real value and hands-on support for founders.

read more

The super{set} CEO

Tom and Vivek describe what the ideal CEO looks like in the early stage, why great product people aren’t necessarily going to make great CEOs, and what the division of labor looks like between the CEO and the rest of the early team. They then bring on special guest Dane E. Holmes from super{set} company Eskalera to hear about his decision to join a super{set} company and his lessons for early-stage leadership.

read more

How To Avoid Observability MELTdown

o11y - What is it? Why is it important? What are the tools you need? More importantly - how can you adopt an observability mindset? Habu Software Architect Siddharth Sharma reports from his session at super{summit} 2022.

read more

When Inference Meets Engineering

Othmane Rifki, Principal Applied Scientist at super{set} company Spectrum Labs, reports from the session he led at super{summit} 2022: "When Inference Meets Engineering." Using super{set} companies as examples, Othmane reveals the 3 ways that data science can benefit from engineering workflows to deliver business value.

read more

Infrastructure Headaches - Where’s the Tylenol?

Head of Infrastructure at Ketch, and Kapstan Advisor, Anton Winter explains a few of the infrastructure and DevOps headaches he encounters every day.

read more

Calling BULLSHIT

Tom and Vivek jump on the pod for a special bonus episode to call BULLSHIT on VCs, CEOs, the “categorical shit,” and more. So strap yourselves in because the takes are HOT.

read more

Former Salesforce SVP of Marketing Strategy and Innovation Jon Suarez-Davis “JSD” Appointed Chief Commercial Officer at super{set}

The Move Accelerates the Rapidly Growing Startup Studio’s Mission to Lead the Next Generation of AI and Data-Driven Market Innovation and Success

read more

Why I'm Joining super{set} as Chief Commercial Officer

Announcing Jon Suarez-Davis (jsd) as super{set}’s Chief Commercial Officer: jsd tells us in his own words why he's joining super{set}

read more

When and Why to Bring on VCs

Tom and Vivek describe the lessons learned from fundraising at Rapt in 1999 - the height of the first internet bubble - through their experience at Krux - amid the most recent tech bubble. After sharing war stories, they describe how super{set} melds funding with hands-on entrepreneurship to set the soil conditions for long-term success.

read more

Startup Boards 101

Tom and Vivek have come full circle: in this episode they’re talking about closed session board meetings in The {Closed} Session. They discuss their experience in board meetings - even some tense ones - as serial founders and how they approach board meetings today as both co-founders and seed investors of the companies coming out of the super{set} startup studio.

read more

Q&A with Accel Founder Arthur Patterson

Arthur Patterson, founder of venture capital firm Accel, sits down for a fireside chat with super{set} founding partner Tom Chavez as part of our biweekly super{set} Community Call. Arthur and Tom cover venture investing, company-building, and even some personal stories from their history together.

read more

Too Dumb to Quit

The decision to start a company – or to join an early stage one – is an act of the gut. On good days, I see it as a quasi-spiritual commitment. On bad days, I see it as sheer irrationality. Whichever it is, you’ll be happier if you acknowledge and calmly accept the lunacy of it all...

read more

Good Ideas, Good Luck

Coming up with new company ideas is easy: we take the day off, go to the park, and let the thoughts arrive like butterflies. Maybe we grab a coconut from that guy for a little buzz. While this describes a pleasant day in San Francisco, it couldn’t be further from the truth of what we do at super{set}. If only we could pull great ideas out of thin air. Unfortunately, it just doesn’t work that way.

read more

Q&A with Accel Founder Arthur Patterson

Arthur Patterson, founder of venture capital firm Accel, sits down for a fireside chat with super{set} founding partner Tom Chavez as part of our biweekly super{set} Community Call. Arthur and Tom cover venture investing, company-building, and even some personal stories from their history together.

read more

Why I Left Google To Co-found with super{set}

Gal Vered of Checksum explains his rationale for leaving Google to co-found a super{set} company.

read more

From Watsonville To The Moon

This post was written by Habu software engineer, Martín Vargas-Vega, as part of our new #PassTheMic series.

read more

How I Learned to Stop Optimizing and Love the Startup Ride

Reflections after a summer as an engineering intern at super{set}

read more

Founder and Father: A Balancing Act

Making It Work With Young Kids & Young Companies

read more

Infrastructure Headaches - Where’s the Tylenol?

Head of Infrastructure at Ketch, and Kapstan Advisor, Anton Winter explains a few of the infrastructure and DevOps headaches he encounters every day.

read more

The super{set} Entrepreneurial Guild

Has someone looking to make a key hire ever told you that they are after “coachability”? Take a look at the Google ngram for “coachability” — off like a rocket ship since the Dot Com bubble, and it’s not even a real word! Coaching is everywhere in Silicon Valley...

read more

Silicon Valley’s Greatest Untapped Resource: Moms

This post was written by MarkovML Co-Founder, Lindsey Meyl, as part of our #PassTheMic series.

read more

The Balancing Act For Women in Tech

This post was written by Ketch Sales Director, Sheridan Rice, as part of our #PassTheMic series.

read more

The Four Types of Startup Opportunities

In our last post, we discussed how data is the new general-purpose technology and that is why at super{set} we form data-driven companies from scratch. But new technologies are a promise, not a sudden phase change.

read more

Data Eats the World

The wheel. Electricity. The automobile. These are technologies that had a disproportionate impact on the merits of their first practical use-case; but beyond that, because they enabled so much in terms of subsequent innovation, economic historians call them “general-purpose technologies” or GPTs...

read more

We don’t critique, we found and build.

The super{set} studio model for early-stage venture It is still early days for the startup studio model. We know this because at super{set} we still get questions from experienced operators and investors. One investor that we’ve known for years recently asked us: “you have a fund — aren’t you just a venture capital firm with a different label?”

read more

Why Head of Product is Our First Co-Founder

At super{set}, we stand side-by-side and pick up the shovel with our co-founders. Our first outside co-founder at a super{set} company is usually a Head of Product. Let’s unpack each portion of that title....

read more

Why I'm Co-founding @ super{set}

Pankaj Rajan, co-founder at MarkovML, describes his Big Tech and startup experience and his journey to starting a company at super{set}.

read more

super{set}’s Spectrum Detoxifies The Online Space

We are living in a time of extraordinary concern about the negative consequences of online platforms and social media. We worry about the damage interactive technologies cause to society; about the impact to our mental health; and about the way that these platforms and their practices play to our most destructive impulses. Too often, the experiences we have online serve only to polarize, divide, and amplify the worst of human nature.

read more

Navigating Juneteenth

Considered by some to be “America’s Second Independence Day,” Juneteenth has only recently entered the national zeitgeist. Celebrated on the third Saturday in June, it became a federal holiday just last year under President Joe Biden. Many companies are left wondering how to acknowledge the holiday. We sat down with Eskalera’s co-founder Dr. Tolonda Tolbert to get her take.

read more

How To Avoid Observability MELTdown

o11y - What is it? Why is it important? What are the tools you need? More importantly - how can you adopt an observability mindset? Habu Software Architect Siddharth Sharma reports from his session at super{summit} 2022.

read more

Not Just On Veterans Day

This post was written by Ketch Developer Advocate, Ryan Overton, as part of our #PassTheMic series.

read more

Overheard @ super{summit}

Vivek Vaidya's takeaways from the inaugural super{summit}

read more

When Inference Meets Engineering

Othmane Rifki, Principal Applied Scientist at super{set} company Spectrum Labs, reports from the session he led at super{summit} 2022: "When Inference Meets Engineering." Using super{set} companies as examples, Othmane reveals the 3 ways that data science can benefit from engineering workflows to deliver business value.

read more

The Era of Easy $ Is Over

The era of easy money - or at least, easy returns for VCs - is over. Tom Chavez is calling for VCs to show up in-person at August board meetings, get off the sidelines, and start adding real value and hands-on support for founders.

read more

Lessons of Grit from my Immigrant Parents

This post was written by Ketch Solutions Engineer, Sahiti Surapaneni, as part of our #PassTheMic series.

read more

Why I'm Joining super{set} as Chief Commercial Officer

Announcing Jon Suarez-Davis (jsd) as super{set}’s Chief Commercial Officer: jsd tells us in his own words why he's joining super{set}

read more

Former Salesforce SVP of Marketing Strategy and Innovation Jon Suarez-Davis “JSD” Appointed Chief Commercial Officer at super{set}

The Move Accelerates the Rapidly Growing Startup Studio’s Mission to Lead the Next Generation of AI and Data-Driven Market Innovation and Success

read more

VCs Write Investment Memos, We Write Solution Memos

When a VC decides to invest in a company, they write up a document called the “Investment Memo” to convince their partners that the decision is sound. This document is a thorough analysis of the startup...

read more

Thick Skin, Tech and Black History Month

This post was written by Ketch Data Privacy & Compliance Specialist, Jocelyn Brunson, as part of our #PassTheMic series.

read more