Python and PostgreSQL for Huge Data Warehouses

by Hannu Krosing for

In this talk I describe ways to do terabyte-scale multi-machine data warehousing using PostgreSQL as “storage and query processing layer” and the “skype scalability triplets” pl/proxy, pgbouncer and (largely python based) skytools for loading the data into the cluster. Easy map-reduce type huge-data processing using pl/proxy, SQL, pl/pgsql and/or pl/pythonu is demonstrated and differences from typical NoSQL map-reduce are shown. Writing the “transform” part of ETL (extract-transform-load) as python plugins in near-real-time data collection pipeline for this type of data warehouse is demonstrated. Also, a short comparison to other distributed data processing approaches is given, including which one to use for which task.

in on Thursday 4 July at 15:30 See schedule

Do you have some questions on this talk? Leave a comment to the speaker!

New comment

Comment

Name

Email address

URL

Captcha

Language: EN
Duration: 60 minutes (inc Q&A)

Tagged as

postgresql nosql parallelization bigdata scalability pl/python olap optimization pl/pythonu architecture sql performance

Get Support

Support unavailable

Python and PostgreSQL for Huge Data Warehouses

Do you have some questions on this talk? Leave a comment to the speaker!

New comment

Tagged as

Our Sponsors