Found thread data sharing on the GPU. Expect degraded performance due to data globalization. [OMP112]

This missed remark indicates that a globalized value was found on the target device that was not either replaced with stack memory by OMP110 or shared memory by OMP111. Globalization that has not been removed will need to be handled by the runtime and will significantly impact performance.

The OpenMP standard requires that threads are able to share their data between each-other. However, this is not true by default when offloading to a target device such as a GPU. Threads on a GPU cannot shared their data unless it is first placed in global or shared memory. In order to create standards complaint code, the Clang compiler will globalize any variables that could potentially be shared between the threads. In the majority of cases, globalized variables can either be returns to a thread-local stack, or pushed to shared memory. However, in a few cases it is necessary and will cause a performance penalty.

Examples

This example shows legitimate data sharing on the device. It is a convoluted example, but is completely complaint with the OpenMP standard. If globalization was not added this would result in different results on different target devices.

#include <omp.h>
#include <cstdio>

#pragma omp declare target
static int *p;
#pragma omp end declare target

void foo() {
  int x = omp_get_thread_num();
  if (omp_get_thread_num() == 1)
    p = &x;

#pragma omp barrier

  printf ("Thread %d: %d\n", omp_get_thread_num(), *p);
}

int main() {
#pragma omp target parallel
  foo();
}
$ clang++ -fopenmp -fopenmp-targets=nvptx64 -O1 -Rpass-missed=openmp-opt omp112.cpp
omp112.cpp:9:7: remark: Found thread data sharing on the GPU. Expect degraded performance
due to data globalization. [OMP112] [-Rpass-missed=openmp-opt]
int x = omp_get_thread_num();
    ^

A less convoluted example globalization that cannot be removed occurs when calling functions that aren’t visible from the current translation unit.

extern void use(int *x);

void foo() {
  int x;
  use(&x);
}

int main() {
#pragma omp target parallel
  foo();
}
$ clang++ -fopenmp -fopenmp-targets=nvptx64 -O1 -Rpass-missed=openmp-opt omp112.cpp
omp112.cpp:4:7: remark: Found thread data sharing on the GPU. Expect degraded performance
due to data globalization. [OMP112] [-Rpass-missed=openmp-opt]
int x;
    ^

Diagnostic Scope

OpenMP target offloading missed remark.