Discover what companies are behind every visit on your website using Elasticsearch

How? What do we need?

Fine. Let’s begin!

  • An AS consists of blocks of IP addresses which have a distinctly defined policy for accessing external networks and are administered by a single organization but may be made up of several operators.
  • Ingest pipeline: is a feature provided by Elasticsearch. It’s basically a definition of a series of processor that are to be executed and applied over the data before being indexed, in this case over the ip address.

Ingest Pipeline

Let’s explain the pipeline the we are going to create.

  • processors: list of processor to be executed in order.
  • pipeline_ deanonymize_ip: an unique ID to identify the pipeline.
  • geoip: the processor that we need.
PUT _ingest/pipeline/pipeline_deanonymize_ip
{
"description": "Our pipeline to discover companies behind IP address",
"processors": [
{
"geoip": {
"field": "ip",
"database_file": "GeoLite2-ASN.mmdb"
}
}
]
}
  • database_file : as we mentioned previously, we need to link that IP address to ASN, so that’s why to indicate what database is going to use for that.

Simulate endpoint

Great! Now we already have our pipeline defined, let’s use it!

POST /_ingest/pipeline/pipeline_deanonymize_ip/_simulate
{
"docs": [
{
"_source": {
"ip": "8.8.8.8"
}
}
]
}
{
"docs" : [
{
"doc" : {
"_index" : "_index",
"_type" : "_doc",
"_id" : "_id",
"_source" : {
"geoip" : {
"organization_name" : "Google LLC",
"asn" : 15169,
"ip" : "8.8.8.8",
"network" : "8.8.8.0/24"
},
"ip" : "8.8.8.8"
}
}
}
]
}

Conclusion

As we saw, it is so easy to use the ingest pipeline and Elastic provides us with a lot of powerful features and tools to make the most of data. Just with a simple steps, we discover what companies are interested in what we are offering.

--

--

--

Sr Software Engineer | Elasticsearch Consultant | https://www.diegogarea.com

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

solving differential riccati equation with a boundary condition

Paginating S3 objects using boto3

Netflix Uses Golang For Rend Proxy

A Beginner’s Guide to Software Development (Naija version)

I like to help people. That’s why I am a programmer

In-House Software vs SaaS: What’s the Difference and Which Is Better?

Cloud vs On-premise DAM and Image Management

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Diego Garea Rey

Diego Garea Rey

Sr Software Engineer | Elasticsearch Consultant | https://www.diegogarea.com

More from Medium

Explore your event-driven systems using a graph database

Using Apache Nifi for Enterprise Workflow Automation

A Key Problem in Deployment

Configuring Auth0 with WSO2 API Manager for SSO