LLVM OpenMP
libomp_interface.h
Go to the documentation of this file.
1// clang-format off
2// This file does not contain any code; it just contains additional text and formatting
3// for doxygen.
4
5
6//===----------------------------------------------------------------------===//
7//
8// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
9// See https://llvm.org/LICENSE.txt for license information.
10// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
11//
12//===----------------------------------------------------------------------===//
13
14
15/*! @mainpage LLVM  OpenMP* Runtime Library Interface
16@section sec_intro Introduction
17
18This document describes the interface provided by the
19LLVM  OpenMP\other runtime library to the compiler.
20Routines that are directly called as simple functions by user code are
21not currently described here, since their definition is in the OpenMP
22specification available from http://openmp.org
23
24The aim here is to explain the interface from the compiler to the runtime.
25
26The overall design is described, and each function in the interface
27has its own description. (At least, that's the ambition, we may not be there yet).
28
29@section sec_building Building the Runtime
30
31For build instructions, please see https://openmp.llvm.org/Building.html.
32
33@section sec_supported Supported RTL Build Configurations
34
35The architectures supported are IA-32 architecture, Intel®  64, and
36Intel®  Many Integrated Core Architecture. The build configurations
37supported are shown in the table below.
38
39<table border=1>
40<tr><th> <th>icc/icl<th>gcc<th>clang
41<tr><td>Linux\other OS<td>Yes(1,5)<td>Yes(2,4)<td>Yes(4,6,7)
42<tr><td>FreeBSD\other<td>Yes(1,5)<td>Yes(2,4)<td>Yes(4,6,7,8)
43<tr><td>OS X\other<td>Yes(1,3,4)<td>No<td>Yes(4,6,7)
44<tr><td>Windows\other OS<td>Yes(1,4)<td>No<td>No
45</table>
46(1) On IA-32 architecture and Intel&reg;&nbsp; 64, icc/icl versions 12.x
47 are supported (12.1 is recommended).<br>
48(2) gcc version 4.7 is supported.<br>
49(3) For icc on OS X\other, OS X\other version 10.5.8 is supported.<br>
50(4) Intel&reg;&nbsp; Many Integrated Core Architecture not supported.<br>
51(5) On Intel&reg;&nbsp; Many Integrated Core Architecture, icc/icl versions 13.0 or later are required.<br>
52(6) Clang\other version 3.3 is supported.<br>
53(7) Clang\other currently does not offer a software-implemented 128 bit extended
54 precision type. Thus, all entry points reliant on this type are removed
55 from the library and cannot be called in the user program. The following
56 functions are not available:
57@code
58 __kmpc_atomic_cmplx16_*
59 __kmpc_atomic_float16_*
60 __kmpc_atomic_*_fp
61@endcode
62(8) Community contribution provided AS IS, not tested by Intel.
63
64Supported Architectures: IBM(R) Power 7 and Power 8
65<table border=1>
66<tr><th> <th>gcc<th>clang
67<tr><td>Linux\other OS<td>Yes(1,2)<td>Yes(3,4)
68</table>
69(1) On Power 7, gcc version 4.8.2 is supported.<br>
70(2) On Power 8, gcc version 4.8.2 is supported.<br>
71(3) On Power 7, clang version 3.7 is supported.<br>
72(4) On Power 8, clang version 3.7 is supported.<br>
73
74@section sec_frontend Front-end Compilers that work with this RTL
75
76The following compilers are known to do compatible code generation for
77this RTL: icc/icl, gcc. Code generation is discussed in more detail
78later in this document.
79
80@section sec_outlining Outlining
81
82The runtime interface is based on the idea that the compiler
83"outlines" sections of code that are to run in parallel into separate
84functions that can then be invoked in multiple threads. For instance,
85simple code like this
86
87@code
88void foo()
89{
90#pragma omp parallel
91 {
92 ... do something ...
93 }
94}
95@endcode
96is converted into something that looks conceptually like this (where
97the names used are merely illustrative; the real library function
98names will be used later after we've discussed some more issues...)
99
100@code
101static void outlinedFooBody()
102{
103 ... do something ...
104}
105
106void foo()
107{
108 __OMP_runtime_fork(outlinedFooBody, (void*)0); // Not the real function name!
109}
110@endcode
111
112@subsection SEC_SHAREDVARS Addressing shared variables
113
114In real uses of the OpenMP\other API there are normally references
115from the outlined code to shared variables that are in scope in the containing function.
116Therefore the containing function must be able to address
117these variables. The runtime supports two alternate ways of doing
118this.
119
120@subsubsection SEC_SEC_OT Current Technique
121The technique currently supported by the runtime library is to receive
122a separate pointer to each shared variable that can be accessed from
123the outlined function. This is what is shown in the example below.
124
125We hope soon to provide an alternative interface to support the
126alternate implementation described in the next section. The
127alternative implementation has performance advantages for small
128parallel regions that have many shared variables.
129
130@subsubsection SEC_SEC_PT Future Technique
131The idea is to treat the outlined function as though it
132were a lexically nested function, and pass it a single argument which
133is the pointer to the parent's stack frame. Provided that the compiler
134knows the layout of the parent frame when it is generating the outlined
135function it can then access the up-level variables at appropriate
136offsets from the parent frame. This is a classical compiler technique
137from the 1960s to support languages like Algol (and its descendants)
138that support lexically nested functions.
139
140The main benefit of this technique is that there is no code required
141at the fork point to marshal the arguments to the outlined function.
142Since the runtime knows statically how many arguments must be passed to the
143outlined function, it can easily copy them to the thread's stack
144frame. Therefore the performance of the fork code is independent of
145the number of shared variables that are accessed by the outlined
146function.
147
148If it is hard to determine the stack layout of the parent while generating the
149outlined code, it is still possible to use this approach by collecting all of
150the variables in the parent that are accessed from outlined functions into
151a single `struct` which is placed on the stack, and whose address is passed
152to the outlined functions. In this way the offsets of the shared variables
153are known (since they are inside the struct) without needing to know
154the complete layout of the parent stack-frame. From the point of view
155of the runtime either of these techniques is equivalent, since in either
156case it only has to pass a single argument to the outlined function to allow
157it to access shared variables.
158
159A scheme like this is how gcc\other generates outlined functions.
160
161@section SEC_INTERFACES Library Interfaces
162The library functions used for specific parts of the OpenMP\other language implementation
163are documented in different modules.
164
165 - @ref BASIC_TYPES fundamental types used by the runtime in many places
166 - @ref DEPRECATED functions that are in the library but are no longer required
167 - @ref STARTUP_SHUTDOWN functions for initializing and finalizing the runtime
168 - @ref PARALLEL functions for implementing `omp parallel`
169 - @ref THREAD_STATES functions for supporting thread state inquiries
170 - @ref WORK_SHARING functions for work sharing constructs such as `omp for`, `omp sections`
171 - @ref THREADPRIVATE functions to support thread private data, copyin etc
172 - @ref SYNCHRONIZATION functions to support `omp critical`, `omp barrier`, `omp master`, reductions etc
173 - @ref ATOMIC_OPS functions to support atomic operations
174 - @ref STATS_GATHERING macros to support developer profiling of libomp
175 - Documentation on tasking has still to be written...
176
177@section SEC_EXAMPLES Examples
178@subsection SEC_WORKSHARING_EXAMPLE Work Sharing Example
179This example shows the code generated for a parallel for with reduction and dynamic scheduling.
180
181@code
182extern float foo( void );
183
184int main () {
185 int i;
186 float r = 0.0;
187 #pragma omp parallel for schedule(dynamic) reduction(+:r)
188 for ( i = 0; i < 10; i ++ ) {
189 r += foo();
190 }
191}
192@endcode
193
194The transformed code looks like this.
195@code
196extern float foo( void );
197
198int main () {
199 static int zero = 0;
200 auto int gtid;
201 auto float r = 0.0;
202 __kmpc_begin( & loc3, 0 );
203 // The gtid is not actually required in this example so could be omitted;
204 // We show its initialization here because it is often required for calls into
205 // the runtime and should be locally cached like this.
206 gtid = __kmpc_global thread num( & loc3 );
207 __kmpc_fork call( & loc7, 1, main_7_parallel_3, & r );
208 __kmpc_end( & loc0 );
209 return 0;
210}
211
212struct main_10_reduction_t_5 { float r_10_rpr; };
213
214static kmp_critical_name lck = { 0 };
215static ident_t loc10; // loc10.flags should contain KMP_IDENT_ATOMIC_REDUCE bit set
216 // if compiler has generated an atomic reduction.
217
218void main_7_parallel_3( int *gtid, int *btid, float *r_7_shp ) {
219 auto int i_7_pr;
220 auto int lower, upper, liter, incr;
221 auto struct main_10_reduction_t_5 reduce;
222 reduce.r_10_rpr = 0.F;
223 liter = 0;
224 __kmpc_dispatch_init_4( & loc7,*gtid, 35, 0, 9, 1, 1 );
225 while ( __kmpc_dispatch_next_4( & loc7, *gtid, & liter, & lower, & upper, & incr ) ) {
226 for( i_7_pr = lower; upper >= i_7_pr; i_7_pr ++ )
227 reduce.r_10_rpr += foo();
228 }
229 switch( __kmpc_reduce_nowait( & loc10, *gtid, 1, 4, & reduce, main_10_reduce_5, & lck ) ) {
230 case 1:
231 *r_7_shp += reduce.r_10_rpr;
232 __kmpc_end_reduce_nowait( & loc10, *gtid, & lck );
233 break;
234 case 2:
235 __kmpc_atomic_float4_add( & loc10, *gtid, r_7_shp, reduce.r_10_rpr );
236 break;
237 default:;
238 }
239}
240
241void main_10_reduce_5( struct main_10_reduction_t_5 *reduce_lhs,
242 struct main_10_reduction_t_5 *reduce_rhs )
243{
244 reduce_lhs->r_10_rpr += reduce_rhs->r_10_rpr;
245}
246@endcode
247
248@defgroup BASIC_TYPES Basic Types
249Types that are used throughout the runtime.
250
251@defgroup DEPRECATED Deprecated Functions
252Functions in this group are for backwards compatibility only, and
253should not be used in new code.
254
255@defgroup STARTUP_SHUTDOWN Startup and Shutdown
256These functions are for library initialization and shutdown.
257
258@defgroup PARALLEL Parallel (fork/join)
259These functions are used for implementing <tt>\#pragma omp parallel</tt>.
260
261@defgroup THREAD_STATES Thread Information
262These functions return information about the currently executing thread.
263
264@defgroup WORK_SHARING Work Sharing
265These functions are used for implementing
266<tt>\#pragma omp for</tt>, <tt>\#pragma omp sections</tt>, <tt>\#pragma omp single</tt> and
267<tt>\#pragma omp master</tt> constructs.
268
269When handling loops, there are different functions for each of the signed and unsigned 32 and 64 bit integer types
270which have the name suffixes `_4`, `_4u`, `_8` and `_8u`. The semantics of each of the functions is the same,
271so they are only described once.
272
273Static loop scheduling is handled by @ref __kmpc_for_static_init_4 and friends. Only a single call is needed,
274since the iterations to be executed by any give thread can be determined as soon as the loop parameters are known.
275
276Dynamic scheduling is handled by the @ref __kmpc_dispatch_init_4 and @ref __kmpc_dispatch_next_4 functions.
277The init function is called once in each thread outside the loop, while the next function is called each
278time that the previous chunk of work has been exhausted.
279
280@defgroup SYNCHRONIZATION Synchronization
281These functions are used for implementing barriers.
282
283@defgroup THREADPRIVATE Thread private data support
284These functions support copyin/out and thread private data.
285
286@defgroup STATS_GATHERING Statistics Gathering from OMPTB
287These macros support profiling the libomp library. Use --stats=on when building with build.pl to enable
288and then use the KMP_* macros to profile (through counts or clock ticks) libomp during execution of an OpenMP program.
289
290@section sec_stats_env_vars Environment Variables
291
292This section describes the environment variables relevant to stats-gathering in libomp
293
294@code
295KMP_STATS_FILE
296@endcode
297This environment variable is set to an output filename that will be appended *NOT OVERWRITTEN* if it exists. If this environment variable is undefined, the statistics will be output to stderr
298
299@code
300KMP_STATS_THREADS
301@endcode
302This environment variable indicates to print thread-specific statistics as well as aggregate statistics. Each thread's statistics will be shown as well as the collective sum of all threads. The values "true", "on", "1", "yes" will all indicate to print per thread statistics.
303
304@defgroup TASKING Tasking support
305These functions support tasking constructs.
306
307@defgroup USER User visible functions
308These functions can be called directly by the user, but are runtime library specific, rather than being OpenMP interfaces.
309
310*/
311