Big Data Revisited: OpenX on Tools for Driving Insight and Revenue

Sorting Out the Most Relevant Bits Can Be Difficult

After a term or two as the buzzword du jour, Big Data is no longer quite the cause celebre in digital advertising. Rather than gracefully gliding through the oceans of data created every day, industry players—particularly publishers—have realized they are swamped with far more data than they could ever know what to do with.

However, partners like OpenX are coming to the aid of publishers with insights drawn from a wide variety of data analyses. We sat down with VP of Product Management Andy Negrin to talk about macro trends eluding publishers, what kind of toolset publishers need to get the most out of their data, services the company is offering to assist publisher analysis efforts and much more.

GAVIN DUNAWAY: One of the drawbacks with big data is that there’s often too much of it, and it’s hard to figure out the data that’s most relevant. What do publishers tend to get distracted by?

ANDY NEGRIN: There are a couple areas. A few years ago, clients kept asking for data. We gave it to them, but they weren’t doing a lot with it, because it’s overwhelming. You can’t pull it into Excel and graph a chart on millions of rows. You need to, for example, look at discard rates, or at an aggregate of the big data. You have put it into a form that’s readable for humans or that fits comfortably with the tools you’re using.

WITH THE SUPPORT OF OPENX
OpenX exists to help publishers grow their businesses by monetizing great content.

Looking at a landscape of bids: Where are the clicks? Where are the opportunities to set floors? You’ll try to do more systematically, as opposed to manually. There’s a lot of information in the bid data. When you’ve got 5 or 10 platforms, to look at the data is a bit overwhelming.

You need a transaction identifier that’s common across platforms. An advertiser might be referred to as one name in Platform A, and as another name in Platform B. It might be the same call throughout, all the way back to the advertiser, but tying it across platforms becomes a challenge.

GAVIN DUNAWAY: What macro trends do you think publishers have trouble seeing and understanding on their sites?

ANDY NEGRIN: One is the shift to mobile, which really flipped over about a year ago. If you’re not looking down a level or two—at device usage, OS usage—you wouldn’t necessarily know that shift was happening on your site.

Another one is the shift in geographic mix as populations age and different kinds of users come in. Not paying attention to that can have a big impact on your business. These things are not hard to look at, but a lot of pubs will look more at the site level and not at user trends over time.

GD: So what kind of resources do you think publishers really need? Are you talking about a crack team of data scientists?

AN: You don’t have to go that far. You can! But I think it’s more about the culture, hiring folks with BI aptitude, and giving them the tools and direction to go at it. We tell them to look at the user trends and page performance, and to make that part of a regular checkup process.

Also, leverage a partner like OpenX to get further insights. A core part of our service offering is helping publishers and demand partners better understand what is happening with their business through data and insights. Good partners will provide regular updates as well as more in-depth analysis during quarterly business reviews.

GD: What should publishers be expecting from a partner like OpenX?

AN: They should seek assistance and insights from their partners. Since we have a view that spans across publishers, we can identify some of the macro trends even sooner than they do. We can give them benchmarks within categories to say, “Your site performance is lagging based on your peer set.”

GD: Are there any specific tools that OpenX leverages to get more data, and get a better picture of across publishers?

AN: We’ve had to go out on the cutting edge with big data. The data growth has been explosive over the last few years. In 2016, we saw about a 270% increase in request volume, and a 70% increase just in data processing.

That’s a macro trend publishers are also seeing. If they put 5 or 10 platforms or exchanges onto their page, that’s five or 10 times the bid data volume they might have to look at or process themselves. Given the explosion in data we have deployed some of the new open source technologies for processing, storing and manipulating data.

Impala is one we’re using—an open-source data store for highly cardinal data sets. We have also started working with Spark and Kakfa as well. The more our internal users and external customers get a taste for going down levels and traversing large data sets, the more they want it at a higher frequency.

Publishers want the system to do more of the grunt work, so they can do more of the value-added work. People really enjoy seeing the trends and pulling the insights out. It’s one thing to pull a report together every week in Excel. It’s another to dig in and see something that can materially impact your business.

GD: Is there a certain type of data that presents more value than you’d expected?

AN: It would be really interesting to use open auction bid data to inform private marketplace and direct sold campaigns. Which advertisers are bidding on which kinds of sites, or which ones are spending a lot of money on your site? Maybe go talk to them. That data is fairly accessible, and I don’t think enough publishers are connecting the dots.

Publishers have a bunch of first-party data that they’re not necessarily leveraging. There are publishers, app providers for example, that have registration and geo data that’s incredibly useful, which the buy side would be happy to pay for. We want to facilitate that connection through data synching or creating deals. The more attributes of a user you have, the more ways you can facilitate the transfer of information, and the more value to the publisher and buyer.

GD: That’s interesting, because we’ve long maintained that programmatic should be a research tool, to better adjust your rate card and what you want from advertisers.

AN: It’s somewhat of a challenge to connect the site level and the audience level. With a retargeting campaign, the publisher doesn’t necessarily know why the advertiser is bidding on your site, but you know they’re spending a lot of money for some of your users. The publisher can at least start a conversation: “Can we give you preferential access or a special deal? Let’s figure it out.”

GD: You mentioned before that the old one-size-fits-all data platform has changed to a multi-purpose tool set. What in particular do publishers need in their tool set?

AN: A lot of publishers haven’t gone beyond Google Analytics and standard SQL databases, and if you want to pull together larger data sets, you need something like Hadoop-style storage. This year OpenX will be releasing new capabilities to allow publishers to explore large data sets in a fairly simple way.

In the past, a lot of publishers would create static dashboards and specific reports, to try to pre-join data. But speaking to customers, we found they like the query builder concept—pull your own data together in a way that makes sense for that particular purpose. Pre-canned reports hamper you from doing that.

GD: What kind of challenges has that presented for OpenX?

AN: One is storing and delivering large data sets. The column-based enterprise software gets really expensive when you’re talking about huge terabytes and petabytes of data, and then it becomes a real challenge to get it to perform. If you do a big query—say you want to pull what happened across all your ad units in 2016—it might have to run for a day. So we’ve taken some cool new technology to do in-memory processing and deliver the data with high quality, low-latency performance and easy access.

In general, everyone needs to think not just about reporting, but bringing data into their day-to-day, joining operational workflow and capabilities with the data.

GD: You can base real-time optimization on data, but at the same time, that could easily become overwhelming.

AN: It’s a balance, for sure. We have that debate a lot around what should be in real time versus near-real time, versus daily aggregation. Real-time gets super expensive, and it’s frankly not always necessary. You wouldn’t want bid data coming through near-real time. You can’t even process it. Some things you have to aggregate to make it viewable. If you try to look at hourly data, it’s too spiky. You have to smooth out the curves, step back to see what’s happening. You can’t get the gestalt of the whole thing without a little bit more of a perspective.