The diffusion area is particularly important to the smooth operation of a wafer fab. Not only does it receive raw wafers at the very beginning of the fabrication process but it also interacts with many other areas of the fab.
The challenge in scheduling diffusion area lies in the particularities involved in its operation:
Balancing very long fixed processing times on batching tools with the features mentioned above makes it exceptionally tricky to get solid production KPIs on diffusion furnaces. Currently, fabs often resort to using simplistic "minimum batch size" dispatch rules that try to balance building full batches (to maximise the utilisation of the tool) with queue time and the risk of violating a timelink constraint.
As a result of these characteristics, it's very common for diffusion areas to become a bottleneck if not managed correctly – negatively impacting the production KPIs of the rest of the fab.
This is what prompts the exploration of more novel scheduling methods, such as the one we'll be discussing in this article.
To explore the various ways to schedule diffusion areas, we review the paper “Job scheduling of diffusion furnaces in semiconductor fabrication facilities” by Wu et al. (2021) that describes a new scheduling system that was deployed live in a 200mm GlobalFoundries wafer fab.
The fab that the system was implemented into consisted of the following attributes; approximately 300 products, 500 recipes, and 4500 lots daily at the diffusion area which is host to more than 90 furnaces.
The approach was designed to build schedules aiming to maximise the weighted number of moves. The weights were based on the product of the wafer and the stage of production for which moves were being calculated.
Schedules were planned by 6 operators several times a day, taking up to 6 hours a day per operator on average. The quality of schedules are also impacted by the judgement and experience of operators which led to suboptimal decisions and lower efficiency.
The heuristic model used in the system took about nine months to be built, whilst the system implementation took one year and a half, with the majority of time being spent on clarifying user requirements and collecting data.
The problem was addressed with techniques called Dynamic Programming and Genetic Algorithm:
Dynamic Programming consists of breaking down a large problem that contains many possible solutions into several sequential sub-problems that are easier to solve. Each of these subproblems is solved one at a time, such that each solution feeds into the next problem.
Each one of these sub-problems is then solved using a modified version of the Genetic Algorithm, a meta-heuristic procedure commonly used for large optimization problems.
After implementing live in the fab, average daily weighted moves per tool improved by 4.1% in the first two months of trials when compared to the 2 months before deployment. When tested offline and compared to historical data, the approach increased the number of moves by 23.4% and the average batch size by 4.1% while reducing tool idling by 62.8%. The authors argue the fab was short of staff, subject to varying demand and product mix over time, and with operators still not fully adhering to the new schedules.
It is also expected that, by exploring the full potential of the system, cycle time can be reduced by 1.8 days and that an increase of eleven thousand moves can be achieved, leading to an estimated financial saving of $2M USD per year.
A lot of the academic literature on scheduling furnaces tend to omit some rather critical details such as missing constraints, only being tested on small test datasets, or they are prohibitively slow in live environments.
The reviewed approach stands out by addressing these issues and successfully implementing a complex scheduling system in a fab that brings measurable improvements to the number of moves, batch size and tool idleness. The model accounts for many relevant details such as preventive maintenance, lots with tool dedications at certain steps and different lot priorities.
Nevertheless, as specialists in scheduling, we have spotted weaknesses in the approach where we believe there are opportunities to make it even more robust and versatile, whilst delivering even better results:
1. Schedule updates every 40 minutes: unexpected events (e.g., machine downtime) can take longer than schedule creation time. Suppose a furnace goes offline 10 minutes after the start of the generation of a new schedule. Two things will happen:
a. Schedule being built (unaware of the machine outage) may dispatch lots to the offline tool.
b. Machine outage will be handled only in the next schedule, 70 minutes after the machine went down.
2. Diffusion furnaces scheduled in isolation: Optimizing diffusion furnaces in isolation may cause other machines and areas to be neglected – resulting in suboptimal decisions. For example, since these clean tools feed other parts of the fab, there’s no guarantee that the necessary WIP will arrive at the furnaces to accommodate the optimized schedule having not taken clean capacity into account.
3. Assumption that transportation time of wafers is negligible compared to the processing time: despite the long processing times in furnaces, it’d be interesting to test transportation times in the model to confirm if it’s indeed irrelevant for scheduling or if it brings different decisions to the final schedule.
4. Loading and unloading time not addressed in the approach: Unlike processing times that are fixed, the loading and unloading times can still vary with the number of wafers.
Flexciton’s solution has been built to schedule any area of a fab through multi-objective optimization, handling multiple fab KPIs with their trade-offs and sending an optimized schedule to the fab every 5 minutes. Below, we outline the main features of how we tackle the main challenges of furnaces scheduling:
1. A fab-wide approach: our optimization engine schedules furnaces not in isolation but together with other machines across the fab. We utilise a holistic approach, looking ahead for bottlenecks across the entire factory and account for the existence of bottleneck tools when making scheduling decisions. For instance, a lower priority wafer may be dispatched before a high priority one if the former is going to a low-utilisation machine while the latter is going to a bottleneck in its next step.
2. Criticality of time constraints: whilst eliminating violations of timelinks, we account for the different criticalities they may present, be it because of the machines and recipes used or due to wafer priorities. This means that under a situation where one of two timelinks must be violated for reasons beyond our control, the less critical timelink will be violated.
3. Multi-objective optimization: We balance multiple KPIs simultaneously and handle their trade-offs through user-defined weights. For example, objectives such as “minimise timelink violations” and “minimise cycle time” can receive different weights depending on the desired behaviour in the fab. This directly impacts decisions such as “how long should a high priority wafer wait for a full batch?”.
4. New schedules every 5 minutes: Our technology is based on a hybrid approach that combines Mixed Integer Linear Programming (MILP) with heuristic and decomposition techniques, enabling the delivery of high-quality schedules to the fab every 5 minutes.
5. Change management: Adherence by operators and managers to a new scheduling system and its decisions is among the main post-implementation challenges. Because of that, our deployments follow a rigorous plan that helps foster a higher adoption of the technology. We also use detailed Gantt charts to aid the visualisation of schedules, which facilitates a solid understanding of decisions made which in turn enables higher adherence from operators.
As explored in this article, scheduling diffusion furnaces can be an extremely complex task. This is true even from a computational standpoint, leading many semiconductor fabs to rely on the judgement and experience of their operators at the cost of obtaining suboptimal and inconsistent schedules that take hours to generate. On the other hand, the usage of some fast-scheduling systems may mean leaving some constraints behind, ignoring different KPIs or not observing the fab in its entirety.
At Flexciton, we combine the best of both worlds and bring fast optimal decisions while fostering technology adoption at all hierarchies of the fab.
In part 2, Dennis explores strategies to enhance cycle time through advanced scheduling solutions, contrasting them with traditional methods. He uses the operating curve, this time to demonstrate how AI scheduling and operational factors, such as product mix, can significantly impact cycle time.
In the first part of 'C for Cycle Time', we explored the essence of cycle time in front-end wafer fabs and its significance for semiconductor companies. We introduced the operating curve, which illustrates the relationship between fab cycle time and factory utilization, as well as the power of predictability and the ripple effects cycle time can have across the supply chain.
In part 2, we will explore strategies to enhance cycle time through advanced scheduling solutions, contrasting them with traditional methods. We will use the operating curve, this time to demonstrate how advanced scheduling and operational factors, such as product mix and factory load, can significantly impact fab cycle time.
By embracing the principles of traditional Lean Manufacturing, essentially focused on reducing waste in production, cycle time can be effectively reduced [1]. Here are a few strategies that can help improve fab cycle time:
The implementation of an advanced AI scheduler can facilitate most of the strategies noted above, leading to an improvement in cycle time with significantly less effort demanded from a wafer fab compared to alternatives such as acquiring new tools. In the next sections we are going to see how this technology can make your existing tools move wafers faster without changing any hardware!
In this section, we delve into how an advanced AI scheduler (AI Scheduler) can maintain factory utilization while reducing cycle time.
First let’s define what an AI Scheduler is. It is an essential fab software that has a core engine powered by AI models such as mathematical optimization. It possesses the ability to adapt to ongoing real-time changes in fab conditions, including variations in product mixes, tool downtimes, and processing times. Its output decisions can achieve superior fab objectives, such as improved cycle time, surpassing the capabilities of heuristic-based legacy scheduling systems. More aspects of an advanced AI scheduler can be found in our previous article, A is for AI. The AI Scheduler optimally schedules fab production in alignment with lean manufacturing principles. It achieves this by optimally sequencing lots and strategically batching and assigning them to tools.
Figure 5 shows an example of how an AI Scheduler can successfully shift the cycle time from the original operating curve closer to the theoretical operating curve. As a result, cycle time is now 30 days at 60% factory utilization. This can be accomplished by enhancing fab efficiency through measures such as minimizing idle times, reducing re-work, and mitigating variability in operations, among other strategies. In the next sections, we will show two examples in metrology and diffusion how cycle time is improved with optimal scheduling.
Many wafer fabs employ a tool pull-system for dispatching. In this approach, operators typically decide which idle tool to attend to, either based on their experience or at times, randomly. Once at the tool, they then select the highest priority lots from those available for processing. A drawback of this system is that operators don't have a comprehensive view of the compatibility between the lots awaiting processing, those in transit to the rack, and the tools available. This limited perspective can lead to longer queuing times and underutilized tools, evident in Figure 6.
An AI Scheduler addresses these inefficiencies. By offering an optimized workflow, it not only shortens the total cycle time but also minimizes variability in tool utilization. This in turn indirectly improves the cycle time of the toolset and overall fab efficiency. For example, Seagate deployed an AI Scheduler to photolithography and metrology bottleneck toolsets that were impacting cycle time. The scheduler reduced queue time by 4.3% and improved throughput by 9.4% at the photolithography toolset [5]. In the metrology toolset, the AI Scheduler reduced variability in tool utilization by 75% which resulted in reduced cycle time too, see Figure 7 [6].
Diffusion is a toolset that poses operational complexities due to its intricate batching options and several coupled process steps between cleaning and various furnace operations [7]. Implementing an AI Scheduler can mitigate many of these challenges, leading to reduced cycle time:
In the above examples of photo, metrology and diffusion toolsets, the AI Scheduler can support operators to achieve consistently high performance. To enhance the efficiency of the scheduling system in fabs predominantly run by operators with minimal AMHS (Automated Material Handling Systems) presence, pairing the scheduler with an operator guidance application, as detailed in one of our recent blogs on user-focused digitalisation, can be a valuable approach. This software will suggest the next task required to be executed by an operator.
The deployment of an AI Scheduler should focus on bottleneck toolsets - specifically, those that determine the fab's cycle time. Reducing the cycle time of a toolset will be inconsequential if that toolset is not a bottleneck. Consequently, fabs should consider the following two approaches:
Another factor to consider is that the actual operating curve of the fab is moving constantly based on changes in the operating conditions of the fab. For example, if the product mix changes substantially, this may impact the recipe distribution enabled in each tool and subsequently, the fab cycle time vs factory utilization curve would shift. The operating curve can also change if the fab layout changes, for example when new tools are added.
In Figure 9, we show an example wherein the cycle time versus factory utilization curve for product mix A shifts upward. This signifies an increased cycle time in the fab due to the recent changes in the product mix (and the factory utilization was slightly reduced under these new conditions). An autonomous AI Scheduler, as described by Sebastian Steele in a recent blog, should be able to understand the different trade-offs. For example, in Figure 10, the AI Scheduler could deal with the same utilization as before (60%) with product mix A, but the cycle time will stay at 50 days (10 days more than in the case with product mix A). Another alternative is that the user can then decide if they want to customize this trade-off so that the fab can move back to the same cycle time with this new product mix B at 40 days but staying with lower utilization at 57%.
Trade-offs between different objectives at local toolsets may impact the fab cycle time. Consider the trade-offs in terms of batching costs versus cycle time. For instance, constructing larger batches might be crucial for high-cost operational tools such as furnaces in diffusion and implant. However, this approach could lead to an extended cycle time for the specific toolset and, consequently, an overall increase in fab cycle time.
Tool availability and efficiency significantly affect cycle time, akin to the influence of product mix on operating curves. If tools experience reduced reliability over time, the operating curve may shift upward, resulting in a worse cycle time for the same utilization. While the scheduler cannot directly control tool availability, strategically scheduling maintenance and integrating it with lot scheduling can positively impact cycle time. A dedicated future article will delve into this topic in more detail.
The topic of the cycle time has been enriched with the introduction of an AI Scheduler, bringing a paradigm shift in how we perceive and manage the dynamics of front-end wafer fabs. As highlighted in our exploration, these schedulers do more than just automate – they optimize. By understanding and predicting the nuances of operations, from tool utilization to lot prioritization, advanced AI schedulers provide a roadmap to not just manage but optimize cycle time considering alternative trade-offs. In future articles we will talk about how scheduling maintenance and other operational aspects can be considered in a unified and autonomous AI platform that we believe would be the next revolution, after the innovations from Arsenal of Venice, Ford and Toyota.
Author: Dennis Xenos, CTO and Cofounder, Flexciton
This two-part article aims to explain how we can improve cycle time in front-end semiconductor manufacturing through innovative solutions. In part 1, we discuss the importance of cycle time for manufacturers and introduce the operating curve to relate cycle time to factory utilization.
This two-part article aims to explain how we can improve cycle time in front-end semiconductor manufacturing through innovative solutions, moving beyond conventional lean manufacturing approaches. In part 1, we will discuss the importance of cycle time for semiconductor manufacturers and introduce the operating curve to relate cycle time to factory utilization. Part 2 will then explore strategies to enhance cycle time through advanced scheduling solutions, contrasting them with traditional methods.
Cycle time, the time to complete and ship products, is crucial for manufacturers. James P. Ignazio, in Optimizing Factory Performance, noted that top-tier manufacturers like Ford and Toyota have historically pursued the same goal to outpace competitors: speed [1]. This speed is achieved through fast factory cycle times.
This emphasis on speed had tangible benefits: Ford, for instance, could afford to pay workers double the average wage while dominating the automotive market. The Arsenal of Venice's accelerated ship assembly secured its status as a dominant city-state. Similarly, fast factory cycle times were central to Toyota’s successful lean manufacturing approach.
Furthermore, semiconductor manufacturers grapple with extended cycle times that can often span 24 weeks [2]. This article will focus on manufacturing processes in front-end wafer fabs as their contribution to the end product, such as a chip or hard drive disk head, spans several months. In contrast, back-end processes can be completed in a matter of weeks [3]. However, the principles discussed apply universally to back-end fabs without sacrificing generality.
Less variability in cycle time helps a wafer fab to achieve better predictability in the manufacturing process. Predictability enables optimal resource allocation; for instance, operators can be positioned at fab toolsets (known as workstations) based on anticipated workload from cycle time predictions. Recognizing idle periods of tools allows for improved maintenance scheduling which will result in reduction in unplanned maintenance. In an upcoming article (Part 2), we'll explore how synchronizing maintenance with production can further shorten cycle times.
Measuring and monitoring cycle times aids in identifying deviations from an expected variability. This, in turn, promptly highlights underlying operational issues, facilitating quicker issue resolution. Additionally, it assists industrial engineers in pinpointing bottlenecks, enabling a focused analysis of root causes and prompt corrective actions.
In the semiconductor industry, cycle time plays a pivotal role in broader supply chain orchestration:
Cycle time is a component of the total lead time of a product (it also includes procurement, transportation, etc). Therefore, total lead time can be reduced if the long cycle times in the front-end wafer fabs are reduced. A reliable cycle time nurtures trust with suppliers, laying the foundation for favorable partnerships and agreements. In essence, cycle time is not just about production; it's the heartbeat of the semiconductor supply chain ecosystem.
Understanding how cycle time impacts product delivery times is essential for the semiconductor industry. In some analyses, you could see that cycle time is confused with capacity, as the authors in a McKinsey article stated “Even with fabs operating at full capacity, they have not been able to meet demand, resulting in product lead times of six months or longer” [4]. On the contrary, in a fab operating at full capacity, lead times of the products will increase as the average cycle time of manufacturing is increasing.
The fab cycle time metric defines the time required to produce a finished product in a wafer fab. The general cycle time term is also used to measure the time required to complete a specific process step (e.g. etching, coating) in a toolset, known as process step cycle time. The fab cycle time consists of the following time components as can be seen in Figure 2:
To measure and monitor cycle time, wafer fabs must track transactional data for each lot, capturing timestamps for events like the beginning and completion of processing at a tool. This data is gathered and stored by a Manufacturing Execution System (MES). Such transactional information can be utilized for historical operations analysis or for constructing models to forecast cycle times influenced by different operational factors. This foundation is crucial for formulating the operational curve of the fab, which we'll delve into in the subsequent part of this blog. As outlined in an article by Deenen et al., there are methods to develop data-driven simulations that accurately predict future cycle times [3].
As we mentioned earlier, historic data can be used to generate the operating curve of a fab which describes the cycle time in relation to the factory utilization. Figure 3 shows the graph of the fab cycle time in days versus the utilization of the fab (%). The utilization of the fab is defined as the WIP divided by the total capacity of the fab.
We have found this method useful in understanding the fundamental principles of cycle time. The operating curve helps to explain how factory physics impact fab KPIs such as cycle time and fab utilization by showing the changes in the operating points:
In Figure 3, you can see that the current fab cycle time is 40 days when the factory utilization is at 60%. Theoretically, we could reduce the cycle time to 22 days. The difference between these two points is due to the inefficiencies that contribute to the factory cycle time as explained in the introduction of this section. In Part 2 of this blog, we will explore the various types of inefficiencies and examine how innovation can shift the operating curve to achieve lower cycle times while maintaining the same fab utilization.
In summary, cycle time is not merely a production metric but the very pulse of the semiconductor manufacturing and supply chain. It governs revenues, shapes market responsiveness, and is pivotal in driving innovation. By understanding its nuances, semiconductor companies can not only optimize their operations but also gain a competitive edge. And while we've scratched the surface on its significance, the question remains: how can we further reduce and refine it? In part 2 of the C for Cycle Time blog, we will discover innovative techniques that promise to revolutionize cycle time management in wafer fabs.
Author: Dennis Xenos, CTO and Cofounder, Flexciton
Ray Cooke delves into the pivotal considerations surrounding cloud adoption in the context of wafer fabrication. For those reading sceptically, uncertain about the merits of cloud integration, or perhaps prompted by concerns about lagging behind competitors—this blog endeavours to shed light on key areas of relevance.
Welcome to a nuanced exploration of pivotal considerations surrounding cloud adoption in the context of wafer fabrication. For those reading sceptically, uncertain about the merits of cloud integration, or perhaps prompted by concerns about lagging behind competitors—this blog endeavours to shed light on key areas of relevance.
For those reading this blog, the chances are you (or perhaps your boss) remain unconvinced about the merits of cloud adoption, yet are open to participating in the ongoing debate. Alternatively, there might be a concern of falling behind industry peers, perhaps heightened by recent security incidents such as the hacking of X-Fab. By the end of this short article, you will have gained valuable insights into the significant areas of cloud security, with the anticipation that such information will contribute to a more informed decision-making process.
Firstly, this is about using a cloud service, not running your own systems in the cloud. There are good arguments for that too, but that’s not what this article is about. So, the areas deemed worthy of exploration within this context include:
Recognising the complexity of these topics, we aim to take a segmented approach, with this blog dedicating its focus to the critical factor of security. Subsequent entries promise a comprehensive discussion on the remaining aspects.
We’re going to start with a simple one. Is your fab in any way connected to the internet? If you’re genuinely air-gapped, then it's reasonable to assume you already have a high level of security. But, if you’re not actually air-gapped, then you could actually improve your security by using a cloud service rather than running that service on-prem. Not instantly obvious perhaps, but let us explain.
The most compelling argument that exists for this is a simple one. Microsoft, AWS, IBM and Google all run respectable professional public clouds. If the service we’re talking about connecting to runs on any one of them, it’s fair to say they have similar approaches to cybersecurity.
Microsoft alone employs 3500 cybersecurity professionals to maintain the security of Azure and together they spend a lot on cybersecurity improvements. That’s an awful lot more person-hours on security than most are going to be able to apply from their team. Every single one of them is contributing to the security of a system running in their cloud.
“Aha!”, you say, “that tells me that the underlying public cloud infrastructure that the service is running on is probably as secure as anything connected to the world could be, but that doesn't mean that the service running on it is, right?” And yes, that’s a fair concern. As one of those service providers, we can confirm that we do not employ 3500 cybersecurity professionals. But because we run our service on Azure, we don’t need to. More than half our fight is already done for us and the remainder is a lot easier. For example:
In discussing the ease of these security measures, perhaps we’ve been slightly frivolous. However, despite the casual tone, the implementation of security measures when using cloud technologies is notably simpler when compared with organisations that manage their own hardware.
On the other hand, maybe you’re a fab that is actually air-gapped. You’ve got a solid on-site security team and excellent anti-social-engineering measures. Why introduce any risk? Fair question. We’d argue that this is going to become an increasingly challenging problem for you and maybe now’s the time to get ahead of the problem. Tools on your shop floor are already getting more modern, with virtualised metrology and off-site telemetry feeds for predicting failure rates using machine learning. Some of these systems just can’t be run on site and you’ll increasingly have to do without the more advanced aspects of your tooling to maintain your air gap. Over time this will take its toll, and your competitors will begin to pull away.
At this point it’s worth mentioning that SEMI has put together standards in the cybersecurity space. These address risks like bringing tools into your network with embedded software on them as well as defining how to set up your fab network to secure it, while still enabling external communication. We’d suggest that you should treat a cloud service no differently. It is entirely possible to use a managed service, in the cloud, connected to your fab, while still relying on purely outbound connectivity from your fab, leaving you entirely in control of what data is provided to the service and what you do with any data made available by that service in return.
If you’re already “internet-enabled” in your fab, then we’d argue that using a reputable public cloud service is actually more secure than running that same service on-prem.
If you’re completely offline, we’re not going to argue that using a cloud service is more secure than not connecting to the internet. What I am arguing though, is that at some point you’re going to have to anyway, so you’re better off getting on top of this now rather than waiting until you’re forced into it by the market.
Author: Ray Cooke, VP of Engineering at Flexciton