numbers of dangerous migrations

missing migrants huge phenomenon in the Mediterranean

migration is a topic of great interest at least in Italy, the destination Country of some migration routes. Writing this post helped the author to get an idea of the phenomenon based on data, hopefully it will also help readers.

The following analysis focuses only on the numbers of the migratory phenomenon: the human histories behind these numbers, the political and economic situations that produce the phenomenon and the international policies that allow it have not been taken into consideration.

missing migrants data

International Organization for Migration (IOM) Missing Migrants Project tracks deaths of migrants, including refugees and asylum-seekers, who have gone missing along mixed migration routes worldwide. Missing Migrants Project data is used to inform target 10.7.3 of the 2030 Agenda for Sustainable Development, the “number of people who died or disappeared in the process of migration towards an international destination.”

The IOM’s Missing Migrants Projects publishes the collected data under a Creative Commons Attribution 4.0 International License.

The retrieved data contains 8698 reported events from January 06, 2014 till last report on August 04, 2021. Each observation reports the date of the report, the number of confirmed death, the estimated number of missing migrants, the migration route, the cause of death, the geographic coordinates of the event together with the source of information and the rank of its quality.

Table 1: missing migrants data summary table
events deaths missing survivors
8698 23822 19096 64106

missing migrants all over the world

Visualizing a map of missing migrants phenomenon helps in understanding where on earth the migration problem is located.

The dark red marks, referring to a single reported event, are concentrated in the following regions: central America, Europe, Middle East and Africa (EMEA) and in South East Asia.

In order to understand better the geographical distribution of the reported events the following infographic visualizes the number of reported events by region together with the estimated number of missing migrants.

While the Mediterranean Sea is the region that records the highest number of missing persons, the border between Mexico and the United States is the one that records the highest number of events.

A clearer way to explore the missing migrants project data all over the world is to visualize with a scatter plot the data summarized by region on the same dimensions as above, number missing migrants and number of events reported, highlighting with a clearer color the region where the quality of the reporting source is ranked higher.

Mediterranean is the region where the missing migrants reported are the most even if the reported events are not as high as for Us-Mexico Border. This is surely due to the fact the migratory route involves crossing the sea on unsuitable boats crammed with migrants. The quality of information on migration events in this region is quite good.

The US-Mexico border has better reporting quality. The migratory route in this case is by land and migratory journeys are often done in small groups so that the number of missing migrants is not as high.

Another relevant region as far as migration is concerned is North Africa. The region reports more event than in Mediterranean with more missing migrants than in US-Mexico border while the quality of the reporting source is worse.

most relevant cause of death

Data on missing migrants reports for each event as many causes of death as they are identified: for example, for an event in North Africa, 7 causes of death are reported for the 3 deaths that occurred.

Analyzing the overall death reported the most relevant causes of death are visualized below.

Drowning is by far the most relevant because many migration routes include shipping by sea. Sickness and lack of access to medicines is ranked second: it is easy to fell sick if you are travelling in harsh conditions. Vehicle accidents during migration causes a lot of deaths. At position 5 and 6 Starvation and Dehydration caused thousands of deaths. Not surprisingly in this macabre top 25 ranking many causes are related to violence: Excessive Physical Abuse, Sexual Abuse, Shot, Stabbed, Violence.

Focusing on Mediterranean Sea these are the most frequent causes of death.

The causes of death in Mediterranean reveals that most of the danger derives from navigating at sea with boats that are not adequate for the number of migrants and from the conditions of the sea.

missing migrants through recent years

Plotting the monthly series of missing migrants grouped by region reveals once more that

nothing in the world is comparable to what happened and is currently on going in the Mediterranean sea as far as migration is concerned.

Therefore, from this point on, the analysis focuses on the Mediterranean region, the one closest to the places where the author lives.

Decomposing the missing migrants for Mediterranean, it is possible to clearly distinguish the different components:

  • trend : rising from 2014 to 2017 and then decreasing, after CoVid outbreak slightly increasing again;

  • seasonality : the amplitude of seasonality component decreased too after 2017 but remains visible;

  • remainder : the remainder component shows some shocks before 2017 while flattening in recent years.

macabre forecasting in the Mediterranean sea

In order to forecast the number of missing migrants in next 12 months (from August 2021 included to July 2022), two time series forecasting methods for univariate data have been used: exponential smoothing and SARIMA.

Exponential smoothing method produces forecasts that are weighted averages of past observations, with the weights decaying exponentially as the observations get older. The specific model used here includes support for additive trend and seasonality components.

Seasonal Autoregressive Integrated Moving Average, or SARIMA, method produces forecasts by modeling autocorrelation in time series data. The integrated element refers to differencing allowing the method to support time series data with a trend. The specific model used here includes a non seasonal part of order (2,0,2) and also a seasonal part of order (0,1,1) with the first number referring to the AR component, the second number to the differencing and the third to the MA part.

As per the chart above, both models predict a similar pattern in the next year while also getting a similar Root Mean Square Error, RMSE. The confidence intervals displayed through the shaded colors (darker 80%, clearer 95%) around the prediction line indicate the uncertainty in the prediction attempt.

The table below shows the RMSE and the average of total predicted missing migrants, calculated by adding up the monthly forecasts.

.model RMSE next_year_missing
exponential_smoothing 233 1840
SARIMA 230 2009

Taking as a basis the forecast obtained by multiplying the average number of missing migrants per month, 2928 by 12 months, the reported forecasts are encouraging as they show lower quantities. Yet the numbers predicted at best point to no less than 1,800 migrants missing next year.


The data clearly states that the worst migration drama is occurring in the Mediterranean Sea. Although the phenomenon has been decreasing in recent years, forecasts for next year still lead to a large number of migrants missing in the Mediterranean. If the issue is of any interest to the international community, an intervention policy should be applied.

As stated at the beginning of this article, this is just a quick analysis of the data published by the IOM and there is no attempt to grasp the stories behind each individual lost on the road to hope.

However, a sense of unease emerges from the numbers analyzed.

But it’s summer vacation time and it’s so hot right now, so the author tries to free himself from this sense of discomfort as he prepares to take a refreshing dip in the Mediterranean Sea which is also the scene of a huge migratory drama.

Feel free to email me if you would like to go deeper in the analysis, thanks for reading!

The analysis shown in this post have been executed using R as main computation tool together with its gorgeous ecosystem ( tidyverse included). In particular time series analysis was based on tsibble and fable packages.