Eliminating Deforestation in the Palm Oil Supply Chain with Causal AI

Problem Definition

Palm oil is by far the most consumed vegetable oil on the planet and can be found in an estimated 50% of goods in UK supermarkets. The increase in the use of this oil can be more widely attributed to the rising demand for vegetable oils as a whole. In fact, there has been more than a ten-fold increase in demand in the last 60 years. Due to its high relative productivity, palm oil crops are responsible for 40% of worldwide vegetable oils despite using only 10% of croplands devoted to oil production.

Oil palm is a tropical plant species, which thrives on humid conditions, high rainfall and sufficient sunlight. Unsurprisingly, the most popular areas for growing this crop are found close to the equator; Indonesia (56%), Malaysia (27%) and Thailand (4%) are responsible for the vast majority of worldwide palm oil production.

These regions are also well-known for being home to some of the world’s most important and sensitive rainforests. In order to achieve such a rapid expansion of palm oil production in recent years, immense amounts of land have been devoted worldwide, with around 20 million hectares currently being used. This means that areas containing irreplaceable and biodiverse-rich forests are being cleared for conventional palm oil plantations in order to meet the demand.

Unfortunately for the companies that produce products containing palm oil (consumer-packaged goods companies, or CPGs), deforestation is but one of the issues that palm oil is heavily associated with; forced/child labor, land rights violations, carbon emissions, etc. When one of these issues is detected as being part of their supply chain, CPGs are often held accountable by non-governmental organizations (NGOs), who publish their findings after mounting sufficient evidence.

Consumers are becoming increasingly conscious of social issues with regards to their purchasing habits, leading some to actively avoid products containing palm oil. This problem is exacerbated by social media trends and online news outlets – when a negative article is published associating palm oil with issues such as deforestation and loss of habitat, this can cause heavy damage to the brand of CPGs, which ultimately results in a loss of product sales.

In order to repair some of the damage caused, CPGs are setting ambitious sustainability goals and attempting to entirely eliminate deforestation from their supply chains within the next few years. However, some of these goals are simply too difficult to achieve without the use of disruptive technologies. A common solution to this problem for many CPGs has been to gather geo-spatial data regarding their suppliers in order to detect deforestation, but this has proven insufficient – deforestation has already occurred by the time the data is gathered.

More broadly, this is indicative of a current problem facing the technology industry. Large enterprises have a wealth of data available to them from sensors and archives and are desperate to gain insights from it. The latest and greatest machine learning algorithms show promise, but often result in opaque, monolithic models with little to no demonstrable business value. Data science as it stands today is fundamentally flawed – a correlation approach is often taken, with poor alignment to organizational goals. In recent times, there has been a shift in the data science and machine learning community, which has started to recognize the importance of causal inference for practical business decision-making.

Value Proposition

Geminos is building a platform offering for CPGs to understand, mitigate and eliminate the root causes of issues such as deforestation from their supply chains. By monitoring changes in the key drivers of risk, decision makers will be empowered to intercept and prevent supply chain incidents, as opposed to simply reacting after the fact.

These kinds of improvements will be fundamental to businesses realizing truly sustainable sourcing and operations – suppliers that present a high risk can be replaced with a suitable alternative or given a higher priority for audit or other investigation. In doing this, CPGs will reduce both the frequency and impact of key incidents, minimizing damage to brand and sales as a result.

The ultimate goal of such a platform is to facilitate digital transformation in the supply chain industry, which according to a recent McKinsey study has a digitization level of only 43 percent, the lowest of five business areas that were examined. In the era of Industry 4.0, traditional supply chain companies face threats from disruptive, agile, fully digital alternatives. Finding fast and effective ways to utilize AI is vital in this new competitive arena, especially for mid-market companies that are yet to digitalize.

Causal inference, or causality is the underlying concept being used to achieve this; by considering business processes as a series of causes and effects, root causes of key events can be identified. This methodology is vital to the overall approach being taken – causal models are built prior to data exploration.

By taking a causal approach to business problems, many of the common pitfalls of traditional data science can be avoided. Instead of immediately exploring vast amounts of data and drawing false conclusions from spurious connections, data requirements are determined from the causal models created. Only relevant data is included in analysis, allowing for transparent, focused and meaningful results.

Approach

As a proof point to base future work off of, deforestation in the palm oil supply chain was chosen as an initial use case. A similar approach could also be applied to other commodities (soy, cocoa, sugar, dairy) and social issues (emissions, labor, land rights) within CPG supply chains.

The process of modelling events itself is rather simple and is available in a low/no code environment through the platform, but the causal calculus embedded within is where the true complexity lies. Once causal models are hydrated with data, the causal effects of each event can be determined – in other words, how much of an effect does each event or variable have on the final outcome.

At this point, these effect estimates can be linked back to individual entities within the data. For this use case, data for individual suppliers could be used to determine the chance of any given supplier being linked with a deforestation incident (or similar), manifesting itself in a trackable data point referred to as a supplier’s ‘risk profile’.

Using a combination of research, official documentation and knowledge from subject-matter experts, causal models for deforestation, audits/grievances and brand impact can be created. These models are intended to be iterated on; as more domain knowledge is obtained, the models can continuously be changed and improved in order to match the real-world processes more closely. The causal models created also act as a foundation for the rest of the process, from which the data requirements and key causes/effects are determined.

The key root causes of deforestation were identified as palm oil demand, government/foreign investment in agriculture, and socio-economic measures such as poverty and corruption. Through a series of intermediary steps, these factors can be linked directly to deforestation. Following causal analysis of data supporting these events, risk profiles for each supplier within the supply chain can be calculated.

Knowledge

As mentioned before, data is a key part of causal analysis, and is only considered after creation of causal models. After considering the events leading to deforestation, two main sources of data were identified – internal data sources surrounding suppliers and audits, and external public data sources surrounding palm oil in general.

Internally, CPG companies may have data concerning the volumes of palm oil being sourced from various suppliers and the networks in place to achieve this, while also maintaining historical audit data. Any and all data of this kind would prove incredibly valuable in causal analysis but will vary in quantity and quality depending on the company in question.

Externally, there are a wide variety of publicly available data sources that provide excellent supporting information for this particular use case. Global Forest Watch provides global tree cover data allowing for calculation of yearly tree loss in relevant areas, while the Food and Agriculture Organization of the United Nations provides a wealth of data surrounding palm oil, including price, producer price index, imports and exports. Socio-economic factors such as relative-wealth index and corruption index are also available from various other sources.

An important note when it comes to data requirements is that not all of the data needed to fulfil the causal models will be readily available, but this does not majorly inhibit causal analysis. Creating causal models will inform organizations as to what data should be collected in the future and will also provide an indication of the potential value of said data.

Knowledge graphs represent a collection of interlinked descriptions of entities – objects, concepts or events. They provide a framework for data integration, unification, analytics and sharing. Knowledge models are able to be built in the same vein as causal models within the platform itself.

In the specific context of this project, a knowledge model was created for the palm oil supply chain, defining entities such as organizations and suppliers, relationships between those entities, and any relevant attributes to support the data gathered. These attributes serve as one of the key links between the knowledge and causal models, but entities within the knowledge model are also able to serve as ‘actors’ in a causal model – certain events will be carried out by a particular entity.

Interface

The platform supports flexible implementation of user interfaces, facilitated by one of its main constituents (NodeRED). User interfaces will present relevant information based on the underlying causal models used to create them, but more importantly will also feature decision boards for key decisions such as auditing or replacing a supplier. These decisions will in turn feed back into any relevant causal models within the application.

Supplier risk profile (chance of future deforestation/grievance) is a recurring theme in any of the UI elements created so far and is derived from causal analysis of deforestation events. Smart recommendations are also made based on comparisons between similar suppliers – for example, “35% of suppliers with similar RSPO certification percentages have grievances reported”. See below for some mock-ups created, using real data where possible.

Conclusion

By understanding, mitigating and eliminating the causes of key issues such as deforestation, CPGs are able to achieve a truly sustainable supply chain, in alignment with their ambitious goals. Our platform takes a different approach to AI by merging causality with traditional data science and includes no/low-code capabilities to ease the creation of finished solutions. This methodology allows large enterprises to gain far greater value from their data and facilitates digital transformation by providing a consistent framework for building AI applications.