The first integrated circuits were invented by Texas Instruments and Fairchild Semiconductor in 1959. Today, semiconductor manufacturing is a $600 billion dollar industry and microchips are ubiquitous and impact our lives in ever increasing ways. To achieve such astonishing growth, academics and industry have had to constantly innovate, researching new production technologies. While much has been said about Moore's law and the push towards higher and higher transistor densities, the innovations made in how the billion dollar factories producing these chips are run have received less attention. This article focuses on innovations in scheduling: algorithms which assign lots to machines, decide in which order they should run, and ensure any required secondary resources (e.g. reticles) are available. These decisions can significantly impact the throughput and efficiency of wafer fabs.
Many innovative technologies in scheduling were first proposed by researchers and have, over time, been adapted in manufacturing. They include:
Early academic research on dispatching rules dates back to the 1980s. Authors at the time already highlighted the significant impact scheduling can have on semiconductor manufacturing. They experimented with different types of dispatching rules, ranging from simple first-in-first-out (FIFO) rules to more bespoke rules focused on particular bottleneck tools. Over time, dispatching rules have evolved from fairly simple to increasingly complex. Rule-based dispatching systems quickly became the state-of-the-art in the industry and continue to be popular for several reasons: they can be intuitive and easy to implement, yet allow covering varying requirements. There are, however, also many situations in which dispatching rules may perform poorly: they have no foresight and generally look only at a single tool and therefore often struggle with load balancing between tools. They also struggle with more advanced constraints such as time constraints or auxiliary resources, e.g. reticles in photolithography. More generally, dispatching systems are a mature technology that has been pushed to its limits and is unlikely to lead to significant increases in productivity and yields.
For these reasons, focus has shifted over time to alternative technologies, especially deterministic scheduling based on mixed-integer programming or constraint programming. In the academic literature, these approaches start to increasingly show up around the 1990s. Early contributions focused on analysing the complexity of the wafer fab scheduling problem and solved the resulting optimization problem using heuristic techniques, but slowly moved towards rigorously scheduling single machines, tackling one particular aspect of the problem at a time. Due to the limited scope deterministic techniques could initially tackle, their adoption in industry lagged behind the academic discussion.
The last twenty years have seen deterministic scheduling techniques mature and schedule larger and more complex fab areas. In the academic literature, authors moved from focusing on single (batching) tools, to entire toolsets or larger areas of the fab including re-entrant flows. They also started including more and more operational constraints such as sequence-dependent setup and processing times, time constraints, or secondary resources such as reticles. In order to achieve this increase in scale and complexity, researchers have applied a large number of optimization techniques, and often combined rigorous mathematical programming methods with heuristic approaches. Some have used general purpose meta-heuristics, such as genetic algorithms or simulated annealing, while others have developed bespoke heuristics for fab scheduling, such as the shifting bottleneck heuristic.
As the size of problems optimization-based scheduling techniques could solve grew, the industry started to explore how to adopt these methods in practice. For example, in 2006, IBM announced that it had successfully used a combination of mixed-integer programming and constraint programming to schedule an area of a fab with up to 500 lot-steps and that this had led to a significant reduction in cycle time. Our own technology at Flexciton leverages mathematical optimization and smart decomposition, combined with modern cloud computing, to efficiently schedule entire fabs. One key advantage of using cloud technology is the ability to access huge amounts of computational power. It allows to break down complicated problems and deliver accurate schedules every few minutes, as well as the ability to adapt the solution strategy to the complexity at hand. Additionally, it enables responsive adjustments, as events unravel in real-time, allowing for a truly dynamic approach to scheduling.
Optimization-based scheduling’s trajectory from an academic niche to a high-impact technology has partially been accelerated by two major trends:
The process has been accompanied by considerable improvements in productivity, as scheduling is able to overcome many of the downsides of dispatching: it can look ahead in time, balance WIP across tools, and improve fab-wide objectives such as cost or cycle-time. A major advantage of scheduling is that it can both increase yields when demand is high and reduce cost when demand is low.
A discussion of scheduling in wafer fabs would not be complete without a word on simulation models. Simulation models are technically not scheduling algorithms - they require dispatching rules or deterministic scheduling inside them to decide machine assignment and sequencing. But they have been used to evaluate and compare different scheduling approaches from the very beginning. They were also quickly adopted by industry and have, for example, been used by STMicroelectronics to re-prioritise lots and by Infineon to help identify better dispatching rules. The development of highly reliable simulation models could greatly increase their use for performance evaluation and scheduling.
More reliable simulation models are also important in light of recent trends in academic literature, which may provide a glimpse into the future of wafer fab scheduling. Rigid dispatching rules that need to be (re)tuned frequently may soon be replaced by deep reinforcement learning agents which learn dispatching rules that improve overall fab objectives. In some studies, such systems have been shown to perform as well as dispatching systems based on expert knowledge. If and when the industry adopts such techniques on a large scale remains to be seen. Since they require accurate simulation models as training environments, they can be extremely computationally intensive, and their adoption will largely depend on the development of faster training and simulation models. The combination of self-learning dispatching systems, and comprehensive, scalable scheduling models may well hold the key to unlocking unprecedented improvements in fab productivity.
Flexciton aspires to be the key enabler in this transition, bringing state-of-the-art scheduling technology to the shop floor in a modern, sophisticated, and user-friendly platform unlike anything else on the market. Despite the enormous challenges that come with the scale of this endeavour, the initial results are very encouraging; cloud-based optimization solutions can indeed bring a step change to streamlining wafer fab scheduling while delivering consistent efficiency gains.
In part 2, Dennis explores strategies to enhance cycle time through advanced scheduling solutions, contrasting them with traditional methods. He uses the operating curve, this time to demonstrate how AI scheduling and operational factors, such as product mix, can significantly impact cycle time.
In the first part of 'C for Cycle Time', we explored the essence of cycle time in front-end wafer fabs and its significance for semiconductor companies. We introduced the operating curve, which illustrates the relationship between fab cycle time and factory utilization, as well as the power of predictability and the ripple effects cycle time can have across the supply chain.
In part 2, we will explore strategies to enhance cycle time through advanced scheduling solutions, contrasting them with traditional methods. We will use the operating curve, this time to demonstrate how advanced scheduling and operational factors, such as product mix and factory load, can significantly impact fab cycle time.
By embracing the principles of traditional Lean Manufacturing, essentially focused on reducing waste in production, cycle time can be effectively reduced [1]. Here are a few strategies that can help improve fab cycle time:
The implementation of an advanced AI scheduler can facilitate most of the strategies noted above, leading to an improvement in cycle time with significantly less effort demanded from a wafer fab compared to alternatives such as acquiring new tools. In the next sections we are going to see how this technology can make your existing tools move wafers faster without changing any hardware!
In this section, we delve into how an advanced AI scheduler (AI Scheduler) can maintain factory utilization while reducing cycle time.
First let’s define what an AI Scheduler is. It is an essential fab software that has a core engine powered by AI models such as mathematical optimization. It possesses the ability to adapt to ongoing real-time changes in fab conditions, including variations in product mixes, tool downtimes, and processing times. Its output decisions can achieve superior fab objectives, such as improved cycle time, surpassing the capabilities of heuristic-based legacy scheduling systems. More aspects of an advanced AI scheduler can be found in our previous article, A is for AI. The AI Scheduler optimally schedules fab production in alignment with lean manufacturing principles. It achieves this by optimally sequencing lots and strategically batching and assigning them to tools.
Figure 5 shows an example of how an AI Scheduler can successfully shift the cycle time from the original operating curve closer to the theoretical operating curve. As a result, cycle time is now 30 days at 60% factory utilization. This can be accomplished by enhancing fab efficiency through measures such as minimizing idle times, reducing re-work, and mitigating variability in operations, among other strategies. In the next sections, we will show two examples in metrology and diffusion how cycle time is improved with optimal scheduling.
Many wafer fabs employ a tool pull-system for dispatching. In this approach, operators typically decide which idle tool to attend to, either based on their experience or at times, randomly. Once at the tool, they then select the highest priority lots from those available for processing. A drawback of this system is that operators don't have a comprehensive view of the compatibility between the lots awaiting processing, those in transit to the rack, and the tools available. This limited perspective can lead to longer queuing times and underutilized tools, evident in Figure 6.
An AI Scheduler addresses these inefficiencies. By offering an optimized workflow, it not only shortens the total cycle time but also minimizes variability in tool utilization. This in turn indirectly improves the cycle time of the toolset and overall fab efficiency. For example, Seagate deployed an AI Scheduler to photolithography and metrology bottleneck toolsets that were impacting cycle time. The scheduler reduced queue time by 4.3% and improved throughput by 9.4% at the photolithography toolset [5]. In the metrology toolset, the AI Scheduler reduced variability in tool utilization by 75% which resulted in reduced cycle time too, see Figure 7 [6].
Diffusion is a toolset that poses operational complexities due to its intricate batching options and several coupled process steps between cleaning and various furnace operations [7]. Implementing an AI Scheduler can mitigate many of these challenges, leading to reduced cycle time:
In the above examples of photo, metrology and diffusion toolsets, the AI Scheduler can support operators to achieve consistently high performance. To enhance the efficiency of the scheduling system in fabs predominantly run by operators with minimal AMHS (Automated Material Handling Systems) presence, pairing the scheduler with an operator guidance application, as detailed in one of our recent blogs on user-focused digitalisation, can be a valuable approach. This software will suggest the next task required to be executed by an operator.
The deployment of an AI Scheduler should focus on bottleneck toolsets - specifically, those that determine the fab's cycle time. Reducing the cycle time of a toolset will be inconsequential if that toolset is not a bottleneck. Consequently, fabs should consider the following two approaches:
Another factor to consider is that the actual operating curve of the fab is moving constantly based on changes in the operating conditions of the fab. For example, if the product mix changes substantially, this may impact the recipe distribution enabled in each tool and subsequently, the fab cycle time vs factory utilization curve would shift. The operating curve can also change if the fab layout changes, for example when new tools are added.
In Figure 9, we show an example wherein the cycle time versus factory utilization curve for product mix A shifts upward. This signifies an increased cycle time in the fab due to the recent changes in the product mix (and the factory utilization was slightly reduced under these new conditions). An autonomous AI Scheduler, as described by Sebastian Steele in a recent blog, should be able to understand the different trade-offs. For example, in Figure 10, the AI Scheduler could deal with the same utilization as before (60%) with product mix A, but the cycle time will stay at 50 days (10 days more than in the case with product mix A). Another alternative is that the user can then decide if they want to customize this trade-off so that the fab can move back to the same cycle time with this new product mix B at 40 days but staying with lower utilization at 57%.
Trade-offs between different objectives at local toolsets may impact the fab cycle time. Consider the trade-offs in terms of batching costs versus cycle time. For instance, constructing larger batches might be crucial for high-cost operational tools such as furnaces in diffusion and implant. However, this approach could lead to an extended cycle time for the specific toolset and, consequently, an overall increase in fab cycle time.
Tool availability and efficiency significantly affect cycle time, akin to the influence of product mix on operating curves. If tools experience reduced reliability over time, the operating curve may shift upward, resulting in a worse cycle time for the same utilization. While the scheduler cannot directly control tool availability, strategically scheduling maintenance and integrating it with lot scheduling can positively impact cycle time. A dedicated future article will delve into this topic in more detail.
The topic of the cycle time has been enriched with the introduction of an AI Scheduler, bringing a paradigm shift in how we perceive and manage the dynamics of front-end wafer fabs. As highlighted in our exploration, these schedulers do more than just automate – they optimize. By understanding and predicting the nuances of operations, from tool utilization to lot prioritization, advanced AI schedulers provide a roadmap to not just manage but optimize cycle time considering alternative trade-offs. In future articles we will talk about how scheduling maintenance and other operational aspects can be considered in a unified and autonomous AI platform that we believe would be the next revolution, after the innovations from Arsenal of Venice, Ford and Toyota.
Author: Dennis Xenos, CTO and Cofounder, Flexciton
This two-part article aims to explain how we can improve cycle time in front-end semiconductor manufacturing through innovative solutions. In part 1, we discuss the importance of cycle time for manufacturers and introduce the operating curve to relate cycle time to factory utilization.
This two-part article aims to explain how we can improve cycle time in front-end semiconductor manufacturing through innovative solutions, moving beyond conventional lean manufacturing approaches. In part 1, we will discuss the importance of cycle time for semiconductor manufacturers and introduce the operating curve to relate cycle time to factory utilization. Part 2 will then explore strategies to enhance cycle time through advanced scheduling solutions, contrasting them with traditional methods.
Cycle time, the time to complete and ship products, is crucial for manufacturers. James P. Ignazio, in Optimizing Factory Performance, noted that top-tier manufacturers like Ford and Toyota have historically pursued the same goal to outpace competitors: speed [1]. This speed is achieved through fast factory cycle times.
This emphasis on speed had tangible benefits: Ford, for instance, could afford to pay workers double the average wage while dominating the automotive market. The Arsenal of Venice's accelerated ship assembly secured its status as a dominant city-state. Similarly, fast factory cycle times were central to Toyota’s successful lean manufacturing approach.
Furthermore, semiconductor manufacturers grapple with extended cycle times that can often span 24 weeks [2]. This article will focus on manufacturing processes in front-end wafer fabs as their contribution to the end product, such as a chip or hard drive disk head, spans several months. In contrast, back-end processes can be completed in a matter of weeks [3]. However, the principles discussed apply universally to back-end fabs without sacrificing generality.
Less variability in cycle time helps a wafer fab to achieve better predictability in the manufacturing process. Predictability enables optimal resource allocation; for instance, operators can be positioned at fab toolsets (known as workstations) based on anticipated workload from cycle time predictions. Recognizing idle periods of tools allows for improved maintenance scheduling which will result in reduction in unplanned maintenance. In an upcoming article (Part 2), we'll explore how synchronizing maintenance with production can further shorten cycle times.
Measuring and monitoring cycle times aids in identifying deviations from an expected variability. This, in turn, promptly highlights underlying operational issues, facilitating quicker issue resolution. Additionally, it assists industrial engineers in pinpointing bottlenecks, enabling a focused analysis of root causes and prompt corrective actions.
In the semiconductor industry, cycle time plays a pivotal role in broader supply chain orchestration:
Cycle time is a component of the total lead time of a product (it also includes procurement, transportation, etc). Therefore, total lead time can be reduced if the long cycle times in the front-end wafer fabs are reduced. A reliable cycle time nurtures trust with suppliers, laying the foundation for favorable partnerships and agreements. In essence, cycle time is not just about production; it's the heartbeat of the semiconductor supply chain ecosystem.
Understanding how cycle time impacts product delivery times is essential for the semiconductor industry. In some analyses, you could see that cycle time is confused with capacity, as the authors in a McKinsey article stated “Even with fabs operating at full capacity, they have not been able to meet demand, resulting in product lead times of six months or longer” [4]. On the contrary, in a fab operating at full capacity, lead times of the products will increase as the average cycle time of manufacturing is increasing.
The fab cycle time metric defines the time required to produce a finished product in a wafer fab. The general cycle time term is also used to measure the time required to complete a specific process step (e.g. etching, coating) in a toolset, known as process step cycle time. The fab cycle time consists of the following time components as can be seen in Figure 2:
To measure and monitor cycle time, wafer fabs must track transactional data for each lot, capturing timestamps for events like the beginning and completion of processing at a tool. This data is gathered and stored by a Manufacturing Execution System (MES). Such transactional information can be utilized for historical operations analysis or for constructing models to forecast cycle times influenced by different operational factors. This foundation is crucial for formulating the operational curve of the fab, which we'll delve into in the subsequent part of this blog. As outlined in an article by Deenen et al., there are methods to develop data-driven simulations that accurately predict future cycle times [3].
As we mentioned earlier, historic data can be used to generate the operating curve of a fab which describes the cycle time in relation to the factory utilization. Figure 3 shows the graph of the fab cycle time in days versus the utilization of the fab (%). The utilization of the fab is defined as the WIP divided by the total capacity of the fab.
We have found this method useful in understanding the fundamental principles of cycle time. The operating curve helps to explain how factory physics impact fab KPIs such as cycle time and fab utilization by showing the changes in the operating points:
In Figure 3, you can see that the current fab cycle time is 40 days when the factory utilization is at 60%. Theoretically, we could reduce the cycle time to 22 days. The difference between these two points is due to the inefficiencies that contribute to the factory cycle time as explained in the introduction of this section. In Part 2 of this blog, we will explore the various types of inefficiencies and examine how innovation can shift the operating curve to achieve lower cycle times while maintaining the same fab utilization.
In summary, cycle time is not merely a production metric but the very pulse of the semiconductor manufacturing and supply chain. It governs revenues, shapes market responsiveness, and is pivotal in driving innovation. By understanding its nuances, semiconductor companies can not only optimize their operations but also gain a competitive edge. And while we've scratched the surface on its significance, the question remains: how can we further reduce and refine it? In part 2 of the C for Cycle Time blog, we will discover innovative techniques that promise to revolutionize cycle time management in wafer fabs.
Author: Dennis Xenos, CTO and Cofounder, Flexciton
Ray Cooke delves into the pivotal considerations surrounding cloud adoption in the context of wafer fabrication. For those reading sceptically, uncertain about the merits of cloud integration, or perhaps prompted by concerns about lagging behind competitors—this blog endeavours to shed light on key areas of relevance.
Welcome to a nuanced exploration of pivotal considerations surrounding cloud adoption in the context of wafer fabrication. For those reading sceptically, uncertain about the merits of cloud integration, or perhaps prompted by concerns about lagging behind competitors—this blog endeavours to shed light on key areas of relevance.
For those reading this blog, the chances are you (or perhaps your boss) remain unconvinced about the merits of cloud adoption, yet are open to participating in the ongoing debate. Alternatively, there might be a concern of falling behind industry peers, perhaps heightened by recent security incidents such as the hacking of X-Fab. By the end of this short article, you will have gained valuable insights into the significant areas of cloud security, with the anticipation that such information will contribute to a more informed decision-making process.
Firstly, this is about using a cloud service, not running your own systems in the cloud. There are good arguments for that too, but that’s not what this article is about. So, the areas deemed worthy of exploration within this context include:
Recognising the complexity of these topics, we aim to take a segmented approach, with this blog dedicating its focus to the critical factor of security. Subsequent entries promise a comprehensive discussion on the remaining aspects.
We’re going to start with a simple one. Is your fab in any way connected to the internet? If you’re genuinely air-gapped, then it's reasonable to assume you already have a high level of security. But, if you’re not actually air-gapped, then you could actually improve your security by using a cloud service rather than running that service on-prem. Not instantly obvious perhaps, but let us explain.
The most compelling argument that exists for this is a simple one. Microsoft, AWS, IBM and Google all run respectable professional public clouds. If the service we’re talking about connecting to runs on any one of them, it’s fair to say they have similar approaches to cybersecurity.
Microsoft alone employs 3500 cybersecurity professionals to maintain the security of Azure and together they spend a lot on cybersecurity improvements. That’s an awful lot more person-hours on security than most are going to be able to apply from their team. Every single one of them is contributing to the security of a system running in their cloud.
“Aha!”, you say, “that tells me that the underlying public cloud infrastructure that the service is running on is probably as secure as anything connected to the world could be, but that doesn't mean that the service running on it is, right?” And yes, that’s a fair concern. As one of those service providers, we can confirm that we do not employ 3500 cybersecurity professionals. But because we run our service on Azure, we don’t need to. More than half our fight is already done for us and the remainder is a lot easier. For example:
In discussing the ease of these security measures, perhaps we’ve been slightly frivolous. However, despite the casual tone, the implementation of security measures when using cloud technologies is notably simpler when compared with organisations that manage their own hardware.
On the other hand, maybe you’re a fab that is actually air-gapped. You’ve got a solid on-site security team and excellent anti-social-engineering measures. Why introduce any risk? Fair question. We’d argue that this is going to become an increasingly challenging problem for you and maybe now’s the time to get ahead of the problem. Tools on your shop floor are already getting more modern, with virtualised metrology and off-site telemetry feeds for predicting failure rates using machine learning. Some of these systems just can’t be run on site and you’ll increasingly have to do without the more advanced aspects of your tooling to maintain your air gap. Over time this will take its toll, and your competitors will begin to pull away.
At this point it’s worth mentioning that SEMI has put together standards in the cybersecurity space. These address risks like bringing tools into your network with embedded software on them as well as defining how to set up your fab network to secure it, while still enabling external communication. We’d suggest that you should treat a cloud service no differently. It is entirely possible to use a managed service, in the cloud, connected to your fab, while still relying on purely outbound connectivity from your fab, leaving you entirely in control of what data is provided to the service and what you do with any data made available by that service in return.
If you’re already “internet-enabled” in your fab, then we’d argue that using a reputable public cloud service is actually more secure than running that same service on-prem.
If you’re completely offline, we’re not going to argue that using a cloud service is more secure than not connecting to the internet. What I am arguing though, is that at some point you’re going to have to anyway, so you’re better off getting on top of this now rather than waiting until you’re forced into it by the market.
Author: Ray Cooke, VP of Engineering at Flexciton