Pythonのpartitionメソッドを使って、特定の文字毎に文字列をリストに分割する方法

2023.01.18 2022.07.25

今回は、Pythonの文字列partition()メソッドの機能について見ていきます。

Python Stringは入力文字列/データを操作するための多くの組み込み関数を導入しています。

Python を始めるにあたって String partition()
Python NumPy partition() メソッド
Python Pandas partition() メソッド
まとめ

Python を始めるにあたって String partition()

Python String partition() メソッドは、入力文字列を区切り文字が最初に現れる場所で分割するために使用されます。

構文は以下の通りです。

input_string.partition(separator)

説明

インタプリタが入力文字列の引数として提供されたセパレータを見つけるとすぐに、Python文字列partition()関数が残りの仕事をします。

これは、文字列をタプル内の3つのセクションに分割します。

セパレータの前にある文字列の部分
セパレータの区切り文字
セパレータの後にある文字列

例として以下の様になります。

inp_str = "Evan and Norah are good friends and they study in the same college."
 
res1 = inp_str.partition('friends')

print(res1,'

')

上記のコードでは、inp_strはセパレーターデリミターであるfriendsで分割される。

出力は以下の通り。

('Evan and Norah are good ', 'friends', ' and they study in the same college.')

例2:

inp_str = "Evan and Norah are good friends and they study in the same college."
 
res2 = inp_str.partition('black')

print(res2,'

')

上のコードのスニペットでは、文字列をセパレータのblackのところで分割しようとした。

しかし、誰でもわかるように、文字列「black」は入力文字列に存在しないので、この場合、関数は入力文字列全体と2つの空の文字列を含むタプルを返します。

出力は

('Evan and Norah are good friends and they study in the same college.', '', '')

例3:

inp_str = "Evan and Norah are good friends and they study in the same college."
 
res3 = inp_str.partition('and') 

print(res3,'

')

上の例では、入力文字列中に区切り文字 ‘and’ が2回出現している。

この場合、partition()関数は入力文字列を「最初に出現した」区切り文字を中心に分割します。

出力は以下のようになる。

('Evan ', 'and', ' Norah are good friends and they study in the same college.')

この記事もチェック：PandasのDataFrameのqueryメソッドの使い方|複数条件や変数、文字列でのやり方を解説

Python NumPy partition() メソッド

NumPyモジュールは、入力配列を適切に分割するためにnumpy.partition()メソッドを提供しています。

numpy.partition()`メソッドは、引数で与えられたn番目の要素を中心に、次のように入力配列を分割します。

numpy.partition() メソッドが呼ばれるとすぐに、まず入力配列のコピーを作成し、配列の要素を sort します。
n 番目の要素より小さい要素は、その前に置かれます。
n 番目の要素と同じかより大きい要素は、それよりも後に置かれます。

構文は以下の様な感じです。

numpy.partition(input_array, nth position, axis=-1, kind_of_sort='introselect', order=None)

n 番目の位置`: パーティションが必要な要素のインデックス。
kind_of_sort: 実行したいソートの種類。デフォルトは ‘introselect’ です。
axis: 要素をソートする軸を指定します。デフォルト値は -1 です。

例えば、以下の様になります。

例:

import numpy
 
inp = numpy.array([10, 90, 0, 50, 12, 100, -87]) 

print ("Elements of input array before partition:

", inp) 
 
res = numpy.partition(inp, 1) 

print ("Elements of array after partition:

", res)

上記のコードでは、partition() メソッドが入力配列のコピーを作成し、内部でそれをソートしています。

ソート後の入力配列は次のようになります。

このうち、ソート後の配列の最初の位置、つまり 0 の要素を中心に分割が行われます。

その後、0より小さい要素は0の前/左に、0より大きい要素はセパレータ要素(0)の右側に配置されます。

注：配列に現れる要素の順序は不定です。

結果は以下の通りです。

Elements of input array before partition:

 [ 10  90   0  50  12 100 -87]
Elements of array after partition:

 [-87   0  90  50  12 100  10]

この記事もチェック：Pythonのsortedメソッドを使ってリストを昇順、降順に入れ替えたり、自作ソートを作る方法

Python Pandas partition() メソッド

Pandasモジュールは、入力された文字列を区切り文字で分割するための Series.str.partition() メソッドを提供しています。

構文は以下の通りです。

Series.str.partition(delimiter='', expand=True/False)

delimiter: デリミタに対応する文字列を分割するためのセパレータを含む。
expand: false の場合、区切られた文字列をタプルとして返す。さもなければ、デリミターを中心にした2つのカラムに分割された値を返す。デフォルトはTrueです。

入力 csv ファイル。

import pandas
 
res = pandas.read_csv("C:UsersHPDesktopBook1.csv") 
 
res["Details"]= res["Details"].str.partition(":",False) 
 
print(res)

例えば、以下の様になります。

結果は以下の通りです。

この記事もチェック：Pythonのfpdfモジュールを使って文字列やファイルをPDFに変換する方法

まとめ

今回は、Pythonのpartition()メソッドのさまざまなシナリオでの動作について理解しました。