Parallelization | El Sotanillo de Juan Sierra Pons

In the previous post I have focused in avoiding as much as possible IO on disk and if that was not possible using buff/cache as much as possible by grouping in time IO operations. This approach can make our ETL processes run X times faster. In the two examples the numbers where:

Avoiding IO at all was 11,3 times faster
Using buff/cache was almost 4 times faster

All the examples used a dataset already in the disk so no real network operation occurred. In this post I am going to focus on network operation using again GNU parallel.

Continue reading →

El Sotanillo de Juan Sierra Pons

Linux, Open Source, Bash, Virtualization, Cloud, Puppet, DevOps, Blog, Travels, etc.

Tag Archives: Parallelization

Optimizing long batch processes or ETL by using buff/cache properly II (parallelizing network operations)

Share