1 Data Structures
1.1 Lists
1.1.1 Iterable and generators
Iterable
lst = [1, 2, 3]
>>> for i in lst :
... print(i)
Generator
st = [x*x for x in range(3)]
>>> for i in lst :
... print(i)
Genrator cannot be read again and again since it generates values on the fly
Yield
>>> def creerGenerateur() :
... mylist = range(3)
... for i in mylist:
... yield i*i
...
>>> generateur = creerGenerateur() # crée un générateur
>>> print(generateur) # generateur est un objet !
< generator object creerGenerateur at 0x2b484b9addc0>
>>> for i in generateur:
... print(i)
0
1
4
1.1.2 Transform a liste
Standard transformation
sequence = [element.upper() for element in sequence]
Mix chars and int
print(['a' * nombre for nombre in sequence]) ## returns as much "a" as "nombre"
Apply filters
nombres = range(10)
print([nombre for nombre in nombres if nombre % 2 == 0]) ## keeps only even numbers
Looser
>>> nombres = range(10)
>>> sommes = []
>>> for nombre in nombres:
... if nombre % 2 == 0:
... somme = 0
... for i in range(nombre):
... somme += i
... sommes.append(somme)
...
>>> print(sommes)
[0, 1, 6, 15, 28]
Intermediate
>>> sommes = []
>>> for nombre in range(10):
... if nombre % 2 == 0:
... sommes.append(sum(range(nombre)))
...
>>> print(sommes)
[0, 1, 6, 15, 28]
Advanced
>>> print([sum(range(nombre)) for nombre in range(10) if nombre % 2 == 0])
Guru
>>> [sum(range(nombre)) for nombre in range(0, 10, 2)]
1.2 Series
source Series is a one-dimensional labeled array capable of holding any data type (integers, strings, floating point numbers, Python objects, etc.). The axis labels are collectively referred to as the index.
1.2.1 Create a serie
import pandas as pd
import numpy as np
s = pd.Series(data, index=index)
Here, data can be many different things with examples bellow:
- a Python dict
- an ndarray
- a scalar value (like 5)
d = {'a' : 0., 'b' : 1., 'c' : 2.}
s = pd.Series(np.random.randn(5), index=['a', 'b', 'c', 'd', 'e'])
pd.Series(5., index=['a', 'b', 'c', 'd', 'e'])
1.2.2 Manipulate Series
s[:3] & s[:-2] -> select
s[s > s.median()] -> filter
s[[4, 3, 1]] -> select by index
np.exp(s) -> operate on all cells
'e' in s -> returns true/false
s.get('f', "does not exist") -> if doesn't exist returns default value
s = pd.Series(np.random.randn(5), name='something') -> Give it a name
s2 = s.rename("different") -> Rename it
1.3 DataFrames
1.3.1 Create a dataframe
1.3.1.1 From dict of series
In [32]: d = {'one' : pd.Series([1., 2., 3.], index=['a', 'b', 'c']),
....: 'two' : pd.Series([1., 2., 3., 4.], index=['a', 'b', 'c', 'd'])}
In [33]: df = pd.DataFrame(d)
Select some raws(index) and columns
pd.DataFrame(d, index=['d', 'b', 'a'], columns=['two', 'three'])
1.3.1.2 From dict of ndarrays / lists
In [39]: d = {'one' : [1., 2., 3., 4.],
....: 'two' : [4., 3., 2., 1.]}
You may add an index afterwards
pd.DataFrame(d, index=['a', 'b', 'c', 'd'])
1.3.1.3 From list of dicts
In [47]: data2 = [{'a': 1, 'b': 2}, {'a': 5, 'b': 10, 'c': 20}]
In [48]: pd.DataFrame(data2)
Out[48]:
a b c
0 1 2 NaN
1 5 10 20.0
In [49]: pd.DataFrame(data2, index=['first', 'second'])
Out[49]:
a b c
first 1 2 NaN
second 5 10 20.0
In [50]: pd.DataFrame(data2, columns=['a', 'b'])
Out[50]:
a b
0 1 2
1 5 10
1.3.2 Query a dataframe
select rows whose column value equals a scalar, some_value, use ==:
df.loc[df['column_name'] == some_value]
To select rows whose column value is in an iterable, some_values, use isin:
df.loc[df['column_name'].isin(some_values)]
Combine multiple conditions with &:
df.loc[(df['column_name'] == some_value) & df['other_column'].isin(some_values)]
To select rows whose column value does not equal some_value, use !=:
df.loc[df['column_name'] != some_value]
isin returns a boolean Series, so to select rows whose value is not in some_values, negate the boolean Series using ~:
df.loc[~df['column_name'].isin(some_values)]
1.3.3 Basic Statitics & groupBy
reprendre ici http://pandas.pydata.org/pandas-docs/stable/dsintro.html#from-structured-or-record-array
comments powered by Disqus