5 Comments
User's avatar
Daniel Jacobsen's avatar

Thanks, good stuff. What are your thoughts on making RL work robustly in this context?

Expand full comment
Nikhil Varshney's avatar

Thanks for reading and asking a very valid question.

----

RL turns inventory planning into an adaptive, goal-driven process rather than a fixed formula. In this context, an RL “agent” learns to place and move stock across multiple locations by trial and error in a simulated supply-chain environment, guided by business objectives.

You set up the agent and its environment, where the AI planner makes allocation decisions. The environment is essentially the digital twin of the network.

We then provide the agent with the system’s state, i.e., inventory levels, warehouse capacities, and forecast distributions. Since it’s a simulation, you reward or penalize each decision by comparing it to the actual outcome.

This creates a what-if scenario, iterating until the agent refines its policy, a mapping from each state to the optimal action. Here, the system becomes dynamic, adapting to seasonal trends, promotions, and unexpected shocks.

It allows you to balance key operational and business levers, cost, speed, and stock-out risk.

This is how I envision the implementation. I’m happy to explore your thoughts and any options for improvement, please let me know.

Expand full comment
Nikhil Varshney's avatar

The goal is for the agent to cover for the non-intuitive moves (that operations or marketing often miss), like pre-positioning high-margin items in emerging hot zip codes.

Expand full comment
Daniel Jacobsen's avatar

Sure. I was thinking about the challenges around obtaining robust RL solutions. But there is some promising work being done, such as https://arxiv.org/abs/2304.08769

Expand full comment
Nikhil Varshney's avatar

Thanks for sharing this Daniel. Appreciate it.

I was able to read the summary of the research and definitely sounds promising. I probably need to do more study on how to build robust RL agents, my vision for now had RL to add as an adaptive layer.

After I have studied the RL configurations I would probably write on one this topic alone. Open to collaborating with you as well. Please Check your DM.

Expand full comment