【深基6.例7】[NOIP2011 普及组] 统计单词数

李荣轩 LV 6 @ 2026-3-22 17:44:19

问题分析

本题核心是单词的精确匹配，关键要求：

不区分大小写：需将单词和文章中的字符统一转换为大写/小写后再匹配；
独立单词匹配：目标单词必须是文章中的一个完整独立单词（前后为空格/文章首尾），而非单词的一部分；
高效处理：文章长度可达 10⁶，需避免低效的逐字符遍历，同时记录出现次数和首次位置。

完整C++代码实现

#include <iostream>
#include <string>
#include <cctype> // 用于tolower/toupper函数

using namespace std;

// 将字符串转换为小写，便于不区分大小写匹配
string toLower(const string& s) {
    string res;
    for (char c : s) {
        res += tolower(c);
    }
    return res;
}

int main() {
    // 关闭同步加速cin/cout，避免大数据量超时
    ios::sync_with_stdio(false);
    cin.tie(nullptr);
    
    string target_word, article;
    // 读取目标单词（注意用cin，自动忽略前导空格，且只读一个单词）
    cin >> target_word;
    // 忽略换行符，读取整行文章（包含空格）
    cin.ignore();
    getline(cin, article);
    
    // 统一转换为小写，消除大小写差异
    string target = toLower(target_word);
    string text = toLower(article);
    
    int count = 0;          // 出现次数
    int first_pos = -1;     // 首次出现位置
    int target_len = target.size();
    int text_len = text.size();
    
    // 遍历文章，检查每个可能的单词起始位置
    for (int i = 0; i <= text_len - target_len; ++i) {
        // 条件1：当前位置开始的子串与目标单词完全匹配
        bool match = (text.substr(i, target_len) == target);
        
        // 条件2：是独立单词（前后为空格/文章首尾）
        bool is_independent = false;
        // 情况1：单词在文章开头，且后一个字符是空格/单词占满文章
        if (i == 0) {
            if (i + target_len == text_len || text[i + target_len] == ' ') {
                is_independent = true;
            }
        }
        // 情况2：单词在文章末尾，且前一个字符是空格
        else if (i + target_len == text_len) {
            if (text[i - 1] == ' ') {
                is_independent = true;
            }
        }
        // 情况3：单词在中间，前后都是空格
        else {
            if (text[i - 1] == ' ' && text[i + target_len] == ' ') {
                is_independent = true;
            }
        }
        
        // 满足两个条件，判定为匹配
        if (match && is_independent) {
            count++;
            // 记录首次出现的原始位置（未转小写的文章中的位置）
            if (first_pos == -1) {
                first_pos = i;
            }
        }
    }
    
    // 输出结果
    if (count > 0) {
        cout << count << " " << first_pos << endl;
    } else {
        cout << -1 << endl;
    }
    
    return 0;
}

代码详细解释

1. 大小写统一处理

string toLower(const string& s) {
    string res;
    for (char c : s) {
        res += tolower(c);
    }
    return res;
}

自定义函数将字符串所有字符转为小写，消除大小写差异（如 To 和 to 统一为 to）；
也可转为大写，核心是保证匹配时字符一致。

2. 输入处理

cin >> target_word;
cin.ignore();
getline(cin, article);

cin >> target_word：读取目标单词（自动跳过前导空格，且只读取到第一个空格为止，符合“单词仅含字母”的输入规则）；
cin.ignore()：忽略换行符（否则 getline 会读取空行）；
getline(cin, article)：读取整行文章（包含空格，符合文章的输入格式）。

3. 核心匹配逻辑

for (int i = 0; i <= text_len - target_len; ++i) {
    // 子串匹配
    bool match = (text.substr(i, target_len) == target);
    // 独立单词判断
    bool is_independent = false;
    // 分三种情况判断是否为独立单词
    if (i == 0) {
        is_independent = (i + target_len == text_len || text[i + target_len] == ' ');
    } else if (i + target_len == text_len) {
        is_independent = (text[i - 1] == ' ');
    } else {
        is_independent = (text[i - 1] == ' ' && text[i + target_len] == ' ');
    }
    
    if (match && is_independent) {
        count++;
        if (first_pos == -1) first_pos = i;
    }
}

子串匹配：用 substr(i, target_len) 截取文章中从 i 开始、长度为目标单词长度的子串，与目标单词对比；
独立单词判断：
1. 单词在文章开头：后接空格或单词占满文章；
2. 单词在文章末尾：前接空格；
3. 单词在中间：前后均为空格；
计数与首次位置：每匹配一次计数+1，首次匹配时记录位置（原始文章的位置，与转小写后的位置一致）。

4. 输出逻辑

if (count > 0) {
    cout << count << " " << first_pos << endl;
} else {
    cout << -1 << endl;
}

匹配到则输出次数和首次位置，否则输出 -1。

测试用例验证

样例1

输入：

To
to be or not to be is a question

转换后：

目标单词：to；
文章：to be or not to be is a question；

匹配过程：

i=0：子串 to 匹配，且是开头+后接空格 → 计数=1，首次位置=0；
i=13：子串 to 匹配，前后均为空格 → 计数=2；输出：2 0（与样例一致）。

样例2

输入：

to
Did the Ottoman Empire lose its power at that time

转换后文章：did the ottoman empire lose its power at that time；

无独立的 to 单词（ottoman 中的 to 是部分，非独立单词）→ 输出 -1（与样例一致）。

边界情况测试

单词占满文章：输入 test + Test → 输出 1 0；
单词在末尾：输入 end + this is the end → 输出 1 12；
单词大小写混合：输入 He + hE is HE, he → 输出 3 0。

总结

核心逻辑：先统一大小写，再遍历文章检查“子串匹配+独立单词”两个条件，记录次数和首次位置；
关键细节：
- 独立单词的三种边界情况（开头、末尾、中间）需全覆盖；
- 输入处理时用 getline 读取含空格的文章，避免丢失空格信息；
- 大数据量下关闭 cin/cout 同步，保证效率；
效率分析：遍历次数为 O(文章长度)，substr 操作在短单词（≤10）下可忽略，完全满足 10⁶ 长度的文章要求。

ID

106

时间

1000ms

内存

256MiB

难度

标签

递交数

已通过

上传者

大白老师

1 条题解

问题分析

完整C++代码实现

代码详细解释

1. 大小写统一处理

2. 输入处理

3. 核心匹配逻辑

4. 输出逻辑

测试用例验证

样例1

样例2

边界情况测试

总结

信息

状态

开发

支持

关于

1 条题解

问题分析

完整C++代码实现

代码详细解释

1. 大小写统一处理

2. 输入处理

3. 核心匹配逻辑

4. 输出逻辑

测试用例验证

样例1

样例2

边界情况测试

总结

【深基6.例7】[NOIP2011 普及组] 统计单词数

信息

状态

开发

支持

关于

登录