1-dim persistent homology

  • Simplex: Vertex, Edge, Triangle, Tetrahedron
  • \(n\)-simplex: \( \Delta = \{ \sum_{i=0}^n t_i a_i | t_i \ge 0, \sum_{i=0}^n t_i = 1\} \)
  • Convex Hull (minimal convex set)
  • Standard \(n\)-simplex: Take the Standard Basis
  • Face of a simplex: Sub-Simplex i.e. when some of \(t_i\) equals \(0\).
  • 2-simplicial complex is a collection of simplices \(K\): contains Vertices, Edge, Triangles
  • 1. The intersection of two simplices is a face of each simplex.
  • 2. If a simplex is in \(K\), all of its faces are in \(K\).
  • 1st homology: counting the number of cycles (or holes) without redundancy.
  • A cycle is "redundant" if it is a linear combination of other cycles or it is the boundary of a simplex.
  • "Equivalence up to continuous deformations"
  • Linear Combinations: think \(\mathbb{Z}_2\) interpretations in graph theory. (direction is important)
  • Trivial Cycles
  • The entire \(\mathbb{R}^2\) except the origin. Think of an ant walking around it.
  • The 1st homology of \(S^1\) is \(\mathbb{Z}\).
  • The 1st homology of a torus is \(\mathbb{Z} + \mathbb{Z}\). 
  • The "two independent cycles" of a torus. 
  • The 1st homology of a Klein Bottle is \(\mathbb{Z} + \mathbb{Z}/2\mathbb{Z}\).
  • One cycle can change its orientation with a continuous deformation: \(\mathbb{Z}/2\mathbb{Z}\) 
  • Immersion, Embedding
  • 1st Betti Number = rank of 1st homology group
  • \(H_1\): 1st homology group
  • \( \cdots \rightarrow C_n \rightarrow C_{n-1} \cdots \rightarrow C_0 \rightarrow 0\)
  • Chain Complex \(C_n\): free abelian groups generated by oriented simplices
  • \(C_n \rightarrow C_{n-1}\) via \(\partial_n\) - boundary operator. 
  • Boundary operator: outward orientation
  • \(H_n = \text{ker} \partial_n / \text{im} \partial_{n+1}\)
  • Looking back at \(H_1 = \text{ker} \partial_1 / \text{im} \partial_2 \)
  • \(\text{im} \partial_2\) is a boundary of elements of \(C_2\).
  • They should be killed - continuous deformations can kill it.
  • Now thinking about the "poset" is like taking linear combinations.
  • Why do we take \(\text{ker} \partial_1\)? 
  • We want to look at cycles - at that is precisely \(\text{ker} \partial_1\).
  • Conclusion: \(H_1\): Cycle / Filled

Graph Filtration, Again

  • For a fixed parameter \(e\), consider the simplicial complex \(X_e\).
  • The nodes are those of \(X\), and the edges of \(X\) are at most \(e\).
  • The triangles are the triples of adjacent edges. 
  • Now we can think of birth/death of a component
  • Keep track of the 1st Betti number as parameter changes
  • Bottleneck Distance: Between Persistence Diagram


'Lecture Notes > 신입생세미나' 카테고리의 다른 글

11/7 신입생세미나  (0) 2019.11.07
10/10 신입생세미나  (1) 2019.10.10
09/26 신입생세미나  (0) 2019.09.26
09/19 신입생세미나  (0) 2019.09.19
09/05 신입생세미나  (0) 2019.09.05

데이터들의 움직임, 그 동역학: 마르코프 체인의 이해

  • Markov Chain
  • A stochastic process \( X = \{X_n\} \) in a countable space \(S\) is a discrete time Markov Chain if 
  • For all \(n\), \(X_n \in S\).
  • For all \(n, i_0, i_1, \cdots , i_n\), we have $$P(X_n = i_n| X_{n-1}=i_{n-1}, \cdots, x_0 = i_0) = P(X_n = i_n | X_{n-1}=i_{n-1}) $$
  • 바로 전 상황에만 depend하다고 생각
  • Transitional Matrix
  • Steady States (Stationary Distribution), Eigenvalues, Perron-Frobenius Theorem
  • Simple/General Random Walk (in \(\mathbb{Z}^2\))
  • Expectation, Independence of Random Variables, Translation Distance
  • Markov chain on a finite graph


'Lecture Notes > 신입생세미나' 카테고리의 다른 글

11/7 신입생세미나  (0) 2019.11.07
10/10 신입생세미나  (1) 2019.10.10
09/26 신입생세미나  (0) 2019.09.26
09/19 신입생세미나  (0) 2019.09.19
09/05 신입생세미나  (0) 2019.09.05
  1. yesterday 2019.10.11 13:05

    늘 재미있는 블로그 하는 거 너무 멋져요

데이터가 사는 세상: 그래프 이론, 네트워크 이론의 소개

  • Network Theory
  • degree = degree centrality = # edges adjacent to vertex
  • cliques, connected components
  • regular graph
  • hub = a node with high degree (maximum degree centrality?)
  • Centrality: ways to determine hubs
  • 1. Degree Centrality = degree of a node 
  • every edge is as important as other edges - not taking the "weights" into calculation
  • 2. Eigenvector Centrality = component of the eigenvector (with the maximum absolute value eigenvalue)
  • the matrix here is vertex-adjacency matrix
  • such eigenvector is unique by the Perron-Frobenius Theorem
  • this eigenvector has all of its components non-negative
  • 3. Closeness Centrality = inverse/reciprocal of average shortest path lengths
  • short shortest-paths implies high centrality, so it is a hub
  • Efficiency
  • 1. Global Efficiency: \(E(G) = \frac{1}{n(n-1)} \cdot \sum_{i \neq j} \frac{1}{d_G(i,j)}\)
  • \(E_{glob}(G) = \frac{E(G)}{E(G^{ideal})}\), where \(G^{ideal}\) is a graph with all possible edges.
  • 2. Local Efficiency: \(E_{loc}(G) = \frac{1}{n} \sum_{i \in G} E(G_i)\)
  • Here, \(G_i\) is a subgraph with \(i\)'s neighbors, but not \(i\).
  • 3. Clustering Coefficient: Global, Local, Network Average
  • ratio between number of triangles and number of triplets
  • Generating Artificial Complex Network
  • Regular Networks: fixed degree - choose adjacent edges randomly
  • Small Word Network (Watts-Strogatz): most nodes are not adjacent
  • Ring lattice with \(k\) neighbors, and reconnect each edge with probability \(p\).
  • Random Network (Erdos-Renyi model): \(P(\text{deg}(v)) = \binom{n-1}{k} p^k (1-p)^{n-1-k}\)
  • cf. Percolation - Kolmogorov's Zero-One Law
  • Scale Free Network (Barabasi-Albert model): preferential attachment
  • add an edge each step, higher degree implies higher probability to connect an edge
  • degree distribution follows the power law - \(P(k) \sim k^{-\gamma}\)
  • Hyperbolic Network: Spatial Network
  • Vertices are sprinkled in hyperbolic space
  • Edges are connected if the distance between the two vertices are close enough
  • a short note on hyperbolic geometry


'Lecture Notes > 신입생세미나' 카테고리의 다른 글

11/7 신입생세미나  (0) 2019.11.07
10/10 신입생세미나  (1) 2019.10.10
09/26 신입생세미나  (0) 2019.09.26
09/19 신입생세미나  (0) 2019.09.19
09/05 신입생세미나  (0) 2019.09.05

데이터가 사는 위상적 공간: 거리 공간과 위상 공간의 이해

  • Clustering
  • Single-Linkage Clustering: 가장 길이가 짧은 edge를 연결하는 것을 반복 (크루스칼 ㅋㅋ)
  • Single-Linkage Matrix - each entry is the distance between objects
  • Complete-Linkage, Average-Linkage, Centroid-Linkage 등 다양한 방식 존재함
  • Minimum Spanning Tree - (진짜 나오네 ㅋㅋ)
  • Metric Space = set + distance function
  • \(d(x,y)=0 \iff x=y\)
  • \(d(x,y)=d(y,x)\)
  • \(d(x,z) \le d(x,y)+d(y,z)\)
  • Topological Space = set + topology
  • Topology = set of open subsets
  • definition of topology and open sets - union and finite intersection
  • short note on homeomorphism 
  • short note on discrete topology
  • path-connected: two points are endpoints of a curve
  • continuity on a topology
  • number of path-connected components = 0th Betti number
  • Topological Data Analysis: Graph Filtration
  • Fix a parameter \(e\) which varies from \(0\) to maximum edge length
  • Consider a graph \(X_e\), whose nodes are those of X, but edges are the edges of length at most \(e\)
  • The resulting graph may have several connected components as we removed some edges of \(X\)
  • The 0th Betti number is a (monotone decreasing) function of \(e\).
  • Record all parameters in which the value of the function changes - birth/death of a component
  • 오랜 시간 버틴 component는 당연히 중요한 component라고 볼 수 있을 것
  • Standardized Moments
  • \(\mu_k = E((X-\mu)^k)\), \(\sigma^k = \left(\sqrt{E((X-\mu)^2)}\right)^k\)
  • \(\displaystyle \frac{\mu_k}{\sigma^k}\)를 생각함. variance, skewness, kurtosis
  • 중앙이 어디인가, 얼마나 퍼져있는가, 얼마나 치우쳐 있는가, tail이 얼마나 두꺼운가
  • How do we differentiate between two data sets using filtration?
  • via slope, kurtosis, dendrograms
  • focus on persistent components that "lasts long" during the filtration process


'Lecture Notes > 신입생세미나' 카테고리의 다른 글

11/7 신입생세미나  (0) 2019.11.07
10/10 신입생세미나  (1) 2019.10.10
09/26 신입생세미나  (0) 2019.09.26
09/19 신입생세미나  (0) 2019.09.19
09/05 신입생세미나  (0) 2019.09.05

데이터가 사는 세상: 유클리드 공간

  • Discussion on the Euclidean Space (with the dot product equipped)
  • Dimension Reduction (Visualization)
  • visualizing data in \( \mathbb{R}^{100} \)  as a picture in \(\mathbb{R}^2\)
  • A1. Principal Component Analysis
  • linear mapping to a subspace so that the variance of the data is maximized
  • covariance Matrix (or correlation Matrix)
  • take the maximal eigenvectors of this matrix (eigenvectors with maximum \(|\lambda|\))
  • the subspace should be generated by those eigenvectors
  • A2. Fisher Linear Discriminant (Linear Discriminant Analysis)
  • group & label each data set
  • take the average of each group
  • maximize a function that will give a large separation between the projected group means
  • also, we should give a small variance within each class, thereby minimizing the class overlap
  • \(J(W) = \frac{(m_2-m_1)^2}{s^2_1+s^2_2}\) should be our target (maximize this value)
  • A3. t-SNE algorithm 
  • construct a probability distribution over pairs of high-dimensional objects
  • similar objects have a high probability of being picked
  • dissimilar objects have an extremely small probability of being picked
  • define a similar probability distribution over the points in the low-dimensional map
  • minimize the Kullback-Leibler Divergence between the two distributions 
  • of course, the divergence should be calculated wrt the locations of the points in the map
  • consider the correlation as "distance" - we want to find a map that preserves the distance well
  • \(D_{KL} (P||Q) = - \sum_{x \in \mathbb{\chi}} P(x) \log \left( \frac{Q(x)}{P(x)} \right) \)
  • this formula has a resemblance with entropy
  • \(\sum_{i=1}^n p_i \cdot (- \log p_i)\) - here, \((-\log p_i)\) is information, and we take the weighted average
  • t-SNE algorithm is extremely useful for practical purposes



'Lecture Notes > 신입생세미나' 카테고리의 다른 글

11/7 신입생세미나  (0) 2019.11.07
10/10 신입생세미나  (1) 2019.10.10
09/26 신입생세미나  (0) 2019.09.26
09/19 신입생세미나  (0) 2019.09.19
09/05 신입생세미나  (0) 2019.09.05