If you’re storing large amounts of data that you need to quick access to, your standard text file isn’t going to cut it. The kinds of cosmological simulations that I run generate huge amounts of data, and to analyse them I need to be able access the exact data that I want quickly and painlessly. HDF5 is one answer. It’s a powerful binary data format with no upper limit on the file size. It provides […]

# Category: Python *

## What is HDF5? – Python and HDF5 by Andrew Collette

What Exactly Is HDF5? HDF5 is a great mechanism for storing large numerical arrays of homogenous type, for data models that can be organized hierarchically and benefit from tagging of datasets with arbitrary metadata. It’s quite different from SQL-style relational databases. HDF5 has quite a few organizational tricks up its sleeve, but if you find yourself needing to enforce relationships between values in various tables, or wanting to perform JOINs on your data, a relational database is probably more appropriate. Likewise, […]

## h5py example : Organizing Data and Metadata, Coping with Large Data Volumes

Organizing Data and Metadata Suppose we have a NumPy array that represents some data from an experiment: >>> import numpy as np >>> temperature = np.random.random(1024) >>> temperature array([ 0.44149738, 0.7407523 , 0.44243584, …, 0.19018119, 0.64844851, 0.55660748]) Let’s also imagine that these data points were recorded from a weather station that sampled the temperature, say, every 10 seconds. In order to make sense of the data, we have to record that sampling interval, or “delta-T,” […]

## h5py : Quick Start Guide

1. Install 2. Core concepts An HDF5 file is a container for two kinds of objects: datasets, which are array-like collections of data, and groups, which are folder-like containers that hold datasets and other groups. The most fundamental thing to remember when using h5py is: Groups work like dictionaries, and datasets work like NumPy arrays Suppose someone has sent you a HDF5 file, mytestfile.hdf5. (To create this file, read Appendix: Creating a file.) The very first thing you’ll need […]

## Matplotlib : bar, stem, pie, histogram, scatter, contour, …

Bar chart 2. Stem Plot 3. Pie chart 4. Histogram 5. Scatter plot 6. Imshow 7. Contour plot 8. 3D Surface plot

## numpy : descriptive statistics

기술 통계(descriptive statistics) 데이터의 개수(count) 평균(mean, average) 분산(variance) 표준 편차(standard deviation) 최댓값(maximum) 최솟값(minimum) 중앙값(median) 사분위수(quartile) 1. count 2. mean 3. variance 4. standard deviation 5. maximum and minimum 6. median 7. quartile

## Matplotlib samples – introduction

Matplotlib usage 2. Title, Axis label 3. Legend 4. Bar chart

## numpy sample codes

numpy arrary 2. numpy slicing 3. numpy integer indexing 4. numpy boolean indexing 5. numpy operation

## NumPy’s Structured Arrays

Sample Code Result

## Setup numpy

install miniconda Miniconda gives you the Python interpreter itself, along with a command-line tool called conda which operates as a cross-platform package manager geared toward Python packages, similar in spirit to the apt or yum tools that Linux users might be familiar with. To get started, download and install the Miniconda package–make sure to choose a version with Python 3–and then install the core packages used in this book: [~]$ conda install numpy pandas scikit-learn matplotlib seaborn jupyter