System Design
At the highest level, the RL system I’ve designed has three “layers.” In ascending order:
Layer 0 - Circuit: controls the dual-axis tracking panel, gathers data on power generation, transmits that data off the board
Layer 1 - Online RL Agent: consumes the information from layer 0 and interacts with layer 0 to request actions
Layer 2 - Offline RL Agent: an off-policy agent training in what would be the cloud in a larger production system
While I’ll be starting initially with Layer 0 and Layer 1, the point of Layer 2 is to mimic what would most likely be a production-intent system deployed in industry, wherein the cloud agent could actually be training on data from many different online agents simultaneously (imagine Autopilot from Tesla, where the AI is able to learn from all of the cars, not just one).
The image above is included below to capture the highest level of how these layers interact.
Agent-Firmware Interaction
Looking at the relationship the online RL agent has with the circuit and its associated firmware, we can explore some of the functions associated withe each layer and how they interact.
The online RL agent will run in Python on a computer — this can ultimately be ported to C / run in Python on a Raspberry Pi, but to maximize learning and focus on the RL concepts right now, I’ll be running Python on my MacBook
Since the RL agent is running on a different device than the circuit which performs actions for the RL agent, it needs a way to communicate with the circuit to request actions and get state measurements back. To do this, the computer communicates with the Arduino via serial communication over USB