Java Program to Count the Number of Duplicate Words in a String
Introduction
Counting the number of duplicate words in a string is a useful task in text processing, often needed for analyzing text or preparing data for natural language processing. This exercise helps you understand how to split strings into words, use data structures like maps in Java, and identify duplicates efficiently. This guide will walk you through writing a Java program that counts the number of duplicate words in a given string.
Learn everything about Java: https://www.javaguides.net/
Problem Statement
Create a Java program that:
- Prompts the user to enter a string.
- Splits the string into words.
- Identifies the duplicate words in the string.
- Counts and displays the number of duplicate words.
Example:
- Input:
"Java is a programming language and Java is also an island. Java is popular."
- Output:
Number of duplicate words: 2 Duplicate words: java: 3 is: 3
Solution Steps
- Read the String: Use the
Scanner
class to take the string as input from the user. - Split the String into Words: Use the
split()
method to break the string into words. - Store Word Counts: Use a
HashMap
to store each word of the string as keys and their counts as values. - Identify and Count Duplicate Words: Iterate through the map to find words with a count greater than 1 and count the number of such words.
- Display the Duplicate Words and Their Counts: Print each duplicate word along with its count.
- Close Resources: Close the
Scanner
class object automatically using the try-resource statement.
Java Program
// Java Program to Count the Number of Duplicate Words in a String
import java.util.HashMap;
import java.util.Map;
import java.util.Scanner;
public class DuplicateWordCounter {
public static void main(String[] args) {
// Step 1: Read the string from the user
try (Scanner scanner = new Scanner(System.in)) {
System.out.print("Enter a string: ");
String input = scanner.nextLine();
// Step 2: Split the string into words
String[] words = input.toLowerCase().split("\\s+");
// Step 3: Store word counts
Map<String, Integer> wordCountMap = new HashMap<>();
for (String word : words) {
wordCountMap.put(word, wordCountMap.getOrDefault(word, 0) + 1);
}
// Step 4: Identify and count duplicate words
int duplicateWordCount = 0;
System.out.println("Duplicate words:");
for (Map.Entry<String, Integer> entry : wordCountMap.entrySet()) {
if (entry.getValue() > 1) {
duplicateWordCount++;
System.out.println(entry.getKey() + ": " + entry.getValue());
}
}
// Step 5: Display the number of duplicate words
System.out.println("Number of duplicate words: " + duplicateWordCount);
}
}
}
Explanation
Step 1: Read the String
- The
Scanner
class is used to read a string input from the user. ThenextLine()
method captures the entire line as a string.
Step 2: Split the String into Words
- The
split()
method is used to divide the string into words based on whitespace. The regex\\s+
handles multiple spaces between words. The string is converted to lowercase to handle case-insensitivity.
Step 3: Store Word Counts
- A
HashMap
is used to store the words as keys and their counts as values.
Step 4: Identify and Count Duplicate Words
- The program iterates through the
HashMap
entries and checks for words with a count greater than 1. These words are duplicates, and their count is incremented.
Step 5: Display the Duplicate Words and Their Counts
- The program prints the total number of duplicate words and each duplicate word with its count.
Output
Enter a string: Java is a programming language and Java is also an island. Java is popular.
Duplicate words:
java: 3
is: 3
Number of duplicate words: 2
Conclusion
This Java program demonstrates how to count and display the number of duplicate words in a user-input string. It covers essential concepts such as string manipulation, using maps to store word counts, and iterating through collections, making it a valuable exercise for beginners learning Java programming.
Learn everything about Java: https://www.javaguides.net/