CPPuddle
|
This repository was initially created to explore how to best use HPX and Kokkos together! For fine-grained GPU tasks, we needed a way to avoid excessive allocations of one-usage GPU buffers (as allocations block the device for all streams) and creation/deletion of GPU executors (as those are usually tied to a stream which is expensive to create as well).
We currently test/use CPPuddle in Octo-Tiger, together with HPX-Kokkos. In this use-case, allocating GPU buffers for all sub-grids in advance would have wasted a lot of memory. On the other hand, unified memory would have caused unnecessary GPU to CPU page migrations (as the old input data gets overwritten anyway). Allocating buffers on-the-fly would have blocked the device. Hence, we currently test this buffer management solution!
The documentation of the current master branch is available here. In particular, the public functionality for the memory recycling in available in the namespace memory_recycling, for the executor pools it is available in the namespace executor_recycling and the work aggregation (kernel fusion) functionality is available in the namespace work_aggregation.
The submodules can be used to obtain the optional dependencies which are required for testing the header-only utilities. If these tests are not required, the submodule (and the respective buildscripts in /scripts) can be ignored safely.
If installed correctly, CPPuddle can be used in other CMake-based projects via