If you have any machine with an SSH server open to the world and you take a look at your logs, you may be alarmed to see so many login attempts from so many unknown IP addresses. DenyHosts is a pretty neat service for Unix-based systems which works in the background reviewing such logs and appending the offending addresses into the hosts.deny
file, thus avoiding brute-force attacks.
The following R snippet may be useful to quickly visualise a hosts.deny
file with logs from DenyHosts. Such file may have comments (lines starting with #
), and actual records are stored in the form <service>: <IP>
. Therefore, read.table
is more than enough to load it into R. The rgeolocate
package is used to geolocate the IPs, and the counts per country are represented in a world map using rworldmap
:
library(dplyr)
library(rgeolocate)
library(rworldmap)
hosts.deny <- "/etc/hosts.deny"
db <- system.file("extdata", "GeoLite2-Country.mmdb", package="rgeolocate")
read.table(hosts.deny, col.names=c("service", "IP")) %>%
pull(IP) %>%
maxmind(db, fields="country_code") %>%
count(country_code) %>%
as.data.frame() %>%
joinCountryData2Map(joinCode="ISO2", nameJoinColumn="country_code") %>%
mapCountryData(nameColumnToPlot="n", catMethod="pretty", mapTitle="Attacks per country")
## 74 codes from your data successfully matched countries in the map
## 2 codes from your data failed to match with a country code in the map
## 168 codes from the map weren't represented in your data
Then, you may consider more specific access restrictions based on IP prefixes…
Country of origin is less useful than ASN data but geo-based rulesets can be helpful if one is sure one won’t be in said country ever.
As co-author of that pkg and a cybersecurity researcher, you also almost never want to use the bundled data for cyber analyses. You need geo-dbs and asn-dbs at the time of connection origin for the most accurate geo-mapping. It changes more often than you may want to believe.
Also, maxmind dbs — esp the free ones — are more inaccurate than their stats suggest. We discuss that a bit in our 2017 National Exposure Index. https://www.rapid7.com/info/national-exposure-index/
[…] Article originally published in Enchufa2.es: Visualising SSH attacks with R. […]
[…] Article originally published in Enchufa2.es: Visualising SSH attacks with R. […]
[…] article was first published on R – Enchufa2, and kindly contributed to […]
Thanks, Bob. This is not meant for research. It’s just a very short snippet to get a quick first glance of what’s going on, and for country-level accuracy, these databases are not that bad. Otherwise, you are right: a more serious analysis requires the time of connection (DenyHosts reports this too, as you may know) and more complex queries to online databases, dealing with API limits and all that stuff.