Python案例怎么求取数据交集？

wen python案例 2026-06-09 21

本文目录导读：

Python案例怎么求取数据交集？

列表（List）求交集
集合（Set）求交集
字典（Dictionary）求交集
多个数据集的交集
实际应用案例
性能建议

在Python中求取数据交集有多种方法,主要取决于你使用的数据结构，以下是几种常见情况的案例：

列表（List）求交集

方法1：转换为集合

list1 = [1, 2, 3, 4, 5]
list2 = [4, 5, 6, 7, 8]
# 转换为集合求交集
intersection = list(set(list1) & set(list2))
print(intersection)  # 输出：[4, 5]

方法2：列表推导式（保持顺序）

list1 = [1, 2, 3, 4, 5]
list2 = [4, 5, 6, 7, 8]
# 保持list1的原始顺序
intersection = [item for item in list1 if item in list2]
print(intersection)  # 输出：[4, 5]

集合（Set）求交集

set1 = {1, 2, 3, 4, 5}
set2 = {4, 5, 6, 7, 8}
# 方法1：使用 & 运算符
intersection1 = set1 & set2
print(intersection1)  # 输出：{4, 5}
# 方法2：使用 intersection() 方法
intersection2 = set1.intersection(set2)
print(intersection2)  # 输出：{4, 5}

字典（Dictionary）求交集

求键的交集

dict1 = {'a': 1, 'b': 2, 'c': 3}
dict2 = {'b': 4, 'c': 5, 'd': 6}
# 求键的交集
common_keys = dict1.keys() & dict2.keys()
print(common_keys)  # 输出：{'b', 'c'}
# 获取交集键对应的值（从dict1）
intersection_dict = {key: dict1[key] for key in common_keys}
print(intersection_dict)  # 输出：{'b': 2, 'c': 3}

多个数据集的交集

# 多个列表求交集
list1 = [1, 2, 3, 4, 5]
list2 = [2, 4, 6, 8, 10]
list3 = [2, 4, 6]
# 使用 reduce 函数
from functools import reduce
intersection = reduce(lambda x, y: set(x) & set(y), [list1, list2, list3])
print(intersection)  # 输出：{2, 4}

实际应用案例

案例：找出共同好友

friends_a = ['Tom', 'Jerry', 'Spike', 'Tyke']
friends_b = ['Jerry', 'Spike', 'Butch', 'Toodles']
# 求共同好友
common_friends = set(friends_a) & set(friends_b)
print(f"共同好友：{common_friends}")
# 输出：共同好友：{'Jerry', 'Spike'}

案例：文件内容交集

# 假设有两个文件内容
file1_content = "apple,banana,orange,grape"
file2_content = "banana,grape,watermelon,kiwi"
# 分割成列表并求交集
list1 = file1_content.split(',')
list2 = file2_content.split(',')
intersection = set(list1) & set(list2)
print(f"共同水果：{intersection}")
# 输出：共同水果：{'banana', 'grape'}

性能建议

小数据集（<10000个元素）：使用列表推导式或直接转换集合
大数据集：始终使用集合（set）操作，因为集合的查找时间复杂度为O(1)
需要保持顺序：使用列表推导式，但性能较低

# 性能测试示例
import time
big_list1 = list(range(100000))
big_list2 = list(range(50000, 150000))
start = time.time()
intersection = set(big_list1) & set(big_list2)
print(f"集合方法耗时：{time.time() - start:.4f}秒")
start = time.time()
intersection = [item for item in big_list1 if item in big_list2]
print(f"列表推导式耗时：{time.time() - start:.4f}秒")

选择哪种方法取决于你的具体需求：数据类型、是否需要保持顺序、数据量大小等。