Turning raw data into meaningful insights

How we successfully compared apples and oranges

Gal Abramovitz
Published in
4 min readApr 7, 2022

--

Everyone loves chocolate. Everyone loves pizza.
But can you tell if Lombardi’s is better than Hershey’s?

Akooda helps organizations understand what’s going on both internally, within the organization, and externally, with customers or prospects; manage resource allocation according to plan and gain insights to their customers’ health, sentiment, and interactions.

The challenge of making sense of raw data is a highly complex challenge that the most innovative tech companies face. Now imagine handling different types of data, from multiple sources: conversations, meetings, documents, support tickets, etc.

In this article I’d like to focus on how Akooda’s flexibility allows us to solve this challenge. More specifically, rather than going into the mathematical methods we use to process our data, I’d like to give a taste of the challenges we face as part of processing different data types and their solutions.

Processing the raw data is crucial before we can analyze it and generate insights
Processing the raw data is crucial before we can analyze it and generate insights

Processing the data is crucial for a couple of reasons:

  1. If we look at each data source separately, we won’t be able to identify patterns that emerge only when looking at the full picture. For example, projects in any organization are discussed in meetings and conversations; and are often described more thoroughly in files or tickets.
    Connecting these digital breadcrumbs allows Akooda to look at clients, projects and processes as a whole.
  2. Analyzing each data type separately requires different logic per data source. This architecture is potentially redundant (Confluence and Google docs documents have different metadata, but their contents are inherently similar); and requires significantly more resources to maintain and extend, as each upgrade affects only a single connector.
    When the data follows a scheme regardless of its source, we can decouple third-party integrations from our business logic. This allows us to constantly improve our logic engine, while independently integrating more data sources quickly, efficiently and in a scalable fashion.

Semantically Defining our Model

In the spirit of Domain-Driven Design, we had to agree on a name that describes all the data we fetch. Given that our business is understanding organizational communications, we decided to name our model “interaction”.

Now we had to define what an interaction is for each data type.
For some - like meetings - it was straightforward: a meeting equals an interaction.
However, other types - like conversations - were trickier: while defining a single message as an interaction makes perfect sense, we noticed a single message can be very short and have no context (e.g. “Good morning”) - and is then “outweighed” by a meeting (or a file, or a ticket).

For this reason, we decided to define a conversation as an interaction. This, in turn, raised the complicated challenge of splitting a stream of messages into conversations. But that’s a topic for another article 🙂

Let’s get Unifying

Naturally each interaction should have the basic fields, like its source (e.g. Slack), participants and time. Wait, did you say “time”? Let’s consider three examples for defining interactions’ “life spans” by their types:

  • The times when a meeting starts and ends are static and well-defined. Nice!
  • Conversations start the moment a message is sent. However, we might need to update the end times for conversations when new messages are sent (and figure out when each conversation ends altogether).
  • A ticket can be created (i.e. start time) and immediately get moved to the backlog. After a few months someone can change its priority, then a few days later another person starts working on it — until a week later the ticket is resolved. Hence, it makes sense to define a ticket’s end time as its last update.

Perhaps an interaction should also include some kind of “weight” (e.g. amount of messages, lengths of meetings, etc.). This gets us to the realm of defining all sorts of unintuitive comparisons, such as how many messages equal a ticket status update; But assuming we are given those “subjective” heuristics, we still need to address a fundamental issue: an interaction’s weight isn’t spread evenly throughout its life span.

Improving Our Model

That was when we figured an interaction’s weight and times are intertwined. We wanted to maintain the semantic meaning of interactions, which meant they might include a collection of singular events (e.g. messages, status updates).

Eventually we decided to represent this duality using a dedicated data structure we named “activity map”. Similarly to a log, its keys are timestamps and its values are the weights of each event.

An example for a ticket activity map

Using this activity map, we were able to represent the times in which the interaction was (and wasn’t) active. This allowed us - for example - to analyze and to visualize with better granularity the attention a project gets over time.

Conclusion

This was a simplified example of a product challenge we had to solve as part of our R&D routine in Akooda. Obviously this hasn’t even scratched the surface of how complex it is to define a unified model. This specific model also takes into account topic and sentiment analysis, privacy restrictions, BI metrics and other specifications, both technical and product related.

If you find these challenges interesting and are passionate about solving them as we are, please feel free to contact us! Being an ambitious startup, we’ll always hire people with positive vibes and great skills!

--

--