A Hot Topic: What Makes Scheduling the Diffusion Area so Challenging? [Tech Paper Review]
The diffusion area is particularly important to the smooth operation of a wafer fab. Not only does it receive raw wafers at the very beginning of the fabrication process but it also interacts with many other areas of the fab.
The challenge in scheduling diffusion area lies in the particularities involved in its operation:
Re-entrant flows: Furnaces are loaded with wafers that may have already been processed by other furnaces or wet benches.
Batching machine: Several lots of 1-25 wafers each can be processed together in one batch if they run the same recipe.
Time constraints: The diffusion area has timelinks that determine the maximum time a wafer has to move to a subsequent tool to avoid rework or scrappage. Read more on timelinks here.
Dummy wafers: Used in furnaces, for example, to fill out a lot when a full lot is required or to protect the most exposed ends of a lot to ensure uniformity.
Balancing very long fixed processing times on batching tools with the features mentioned above makes it exceptionally tricky to get solid production KPIs on diffusion furnaces. Currently, fabs often resort to using simplistic "minimum batch size" dispatch rules that try to balance building full batches (to maximise the utilisation of the tool) with queue time and the risk of violating a timelink constraint.
As a result of these characteristics, it's very common for diffusion areas to become a bottleneck if not managed correctly – negatively impacting the production KPIs of the rest of the fab.
This is what prompts the exploration of more novel scheduling methods, such as the one we'll be discussing in this article.
Case Study: Job Scheduling of Diffusion Furnaces
To explore the various ways to schedule diffusion areas, we review the paper “Job scheduling of diffusion furnaces in semiconductor fabrication facilities” by Wu et al. (2021) that describes a new scheduling system that was deployed live in a 200mm GlobalFoundries wafer fab.
Fab Characteristics and the Need for Change
The fab that the system was implemented into consisted of the following attributes; approximately 300 products, 500 recipes, and 4500 lots daily at the diffusion area which is host to more than 90 furnaces.
The approach was designed to build schedules aiming to maximise the weighted number of moves. The weights were based on the product of the wafer and the stage of production for which moves were being calculated.
Schedules were planned by 6 operators several times a day, taking up to 6 hours a day per operator on average. The quality of schedules are also impacted by the judgement and experience of operators which led to suboptimal decisions and lower efficiency.
The Approach to Scheduling
The heuristic model used in the system took about nine months to be built, whilst the system implementation took one year and a half, with the majority of time being spent on clarifying user requirements and collecting data.
The problem was addressed with techniques called Dynamic Programming and Genetic Algorithm:
Dynamic Programming consists of breaking down a large problem that contains many possible solutions into several sequential sub-problems that are easier to solve. Each of these subproblems is solved one at a time, such that each solution feeds into the next problem.
Each one of these sub-problems is then solved using a modified version of the Genetic Algorithm, a meta-heuristic procedure commonly used for large optimization problems.
After implementing live in the fab, average daily weighted moves per tool improved by 4.1% in the first two months of trials when compared to the 2 months before deployment. When tested offline and compared to historical data, the approach increased the number of moves by 23.4% and the average batch size by 4.1% while reducing tool idling by 62.8%. The authors argue the fab was short of staff, subject to varying demand and product mix over time, and with operators still not fully adhering to the new schedules.
It is also expected that, by exploring the full potential of the system, cycle time can be reduced by 1.8 days and that an increase of eleven thousand moves can be achieved, leading to an estimated financial saving of $2M USD per year.
A lot of the academic literature on scheduling furnaces tend to omit some rather critical details such as missing constraints, only being tested on small test datasets, or they are prohibitively slow in live environments.
The reviewed approach stands out by addressing these issues and successfully implementing a complex scheduling system in a fab that brings measurable improvements to the number of moves, batch size and tool idleness. The model accounts for many relevant details such as preventive maintenance, lots with tool dedications at certain steps, different lot priorities and timelinks.
Nevertheless, as specialists in scheduling, we have spotted weaknesses in the approach where we believe there are opportunities to make it even more robust and versatile, whilst delivering even better results:
1. Schedule updates every 40 minutes: unexpected events (e.g., machine downtime) can take longer than schedule creation time. Suppose a furnace goes offline 10 minutes after the start of the generation of a new schedule. Two things will happen:
a. Schedule being built (unaware of the machine outage) may dispatch lots to the offline tool.
b. Machine outage will be handled only in the next schedule, 70 minutes after the machine went down.
2. Diffusion furnaces scheduled in isolation: Optimizing diffusion furnaces in isolation may cause other machines and areas to be neglected – resulting in suboptimal decisions. For example, since these clean tools feed other parts of the fab, there’s no guarantee that the necessary WIP will arrive at the furnaces to accommodate the optimized schedule having not taken clean capacity into account.
3. Assumption that transportation time of wafers is negligible compared to the processing time: despite the long processing times in furnaces, it’d be interesting to test transportation times in the model to confirm if it’s indeed irrelevant for scheduling or if it brings different decisions to the final schedule.
4. Loading and unloading time not addressed in the approach: Unlike processing times that are fixed, the loading and unloading times can still vary with the number of wafers.
Flexciton’s solution has been built to schedule any area of a fab through multi-objective optimization, handling multiple fab KPIs with their trade-offs and sending an optimized schedule to the fab every 5 minutes. Below, we outline the main features of how we tackle the main challenges of furnaces scheduling:
1. A fab-wide approach: our optimization engine schedules furnaces not in isolation but together with other machines across the fab. We utilise a holistic approach, looking ahead for bottlenecks across the entire factory and account for the existence of bottleneck tools when making scheduling decisions. For instance, a lower priority wafer may be dispatched before a high priority one if the former is going to a low-utilisation machine while the latter is going to a bottleneck in its next step.
2. Criticality of time constraints: whilst eliminating violations of timelinks, we account for the different criticalities they may present, be it because of the machines and recipes used or due to wafer priorities. This means that under a situation where one of two timelinks must be violated for reasons beyond our control, the less critical timelink will be violated.
3. Multi-objective optimization: We balance multiple KPIs simultaneously and handle their trade-offs through user-defined weights. For example, objectives such as “minimise timelink violations” and “minimise cycle time” can receive different weights depending on the desired behaviour in the fab. This directly impacts decisions such as “how long should a high priority wafer wait for a full batch?”.
4. New schedules every 5 minutes: Our technology is based on a hybrid approach that combines Mixed Integer Linear Programming (MILP) with heuristic and decomposition techniques, enabling the delivery of high-quality schedules to the fab every 5 minutes.
5. Change management: Adherence by operators and managers to a new scheduling system and its decisions is among the main post-implementation challenges. Because of that, our deployments follow a rigorous plan that helps foster a higher adoption of the technology. We also use detailed Gantt charts to aid the visualisation of schedules, which facilitates a solid understanding of decisions made which in turn enables higher adherence from operators.
As explored in this article, scheduling diffusion furnaces can be an extremely complex task. This is true even from a computational standpoint, leading many semiconductor fabs to rely on the judgement and experience of their operators at the cost of obtaining suboptimal and inconsistent schedules that take hours to generate. On the other hand, the usage of some fast-scheduling systems may mean leaving some constraints behind, ignoring different KPIs or not observing the fab in its entirety.
At Flexciton, we combine the best of both worlds and bring fast optimal decisions while fostering technology adoption at all hierarchies of the fab.