OpenACC Programming and Best Practices Guide
  • Introduction
  • Accelerating an Application with OpenACC
  • Assess Application Performance
  • Parallelize Loops
  • Optimize Data Locality
  • Optimize Loops
  • OpenACC Interoperability
  • Advanced OpenACC Features
OpenACC Programming and Best Practices Guide
  • OpenACC Programming and Best Practices Guide
  • View page source

OpenACC Programming and Best Practices Guide

  • Introduction
    • Writing Portable Code
    • What is OpenACC?
  • Accelerating an Application with OpenACC
    • OpenACC Directive Syntax
    • Porting Cycle
    • Heterogenous Computing Best Practices
    • Case Study - Jacobi Iteration
  • Assess Application Performance
    • Baseline Profiling
    • Additional Profiling
    • Case Study - Analysis
  • Parallelize Loops
    • The Kernels Construct
    • The Parallel Construct
    • Differences Between Parallel and Kernels
    • The Loop Construct
    • Routine Directive
    • Atomic Operations
    • Case Study - Parallelize
  • Optimize Data Locality
    • Data Regions
    • Data Clauses
    • Unstructured Data Lifetimes
    • Update Directive
    • Best Practice: Offload Inefficient Operations to Maintain Data Locality
    • Case Study - Optimize Data Locality
  • Optimize Loops
    • Efficient Loop Ordering
    • OpenACC’s 3 Levels of Parallelism
    • Mapping Parallelism to the Hardware
    • Collapse Clause
    • Routine Parallelism
    • Case Study - Optimize Loops
  • OpenACC Interoperability
    • The Host Data Region
    • Using Device Pointers
    • Obtaining Device and Host Pointer Addresses
    • Additional Vendor-Specific Interoperability Features
  • Advanced OpenACC Features
    • Asynchronous Operation
    • Multi-device Programming
Next

© Copyright 2023, OpenACC.org.

Built with Sphinx using a theme provided by Read the Docs.