Java Program to Find Duplicate Words in a String
Introduction
Finding duplicate words in a string is a useful task in text processing, often needed for analyzing text or preparing data for natural language processing. This exercise helps you understand how to split strings into words, use data structures like maps in Java, and identify duplicates efficiently. This guide will walk you through writing a Java program that identifies and counts the duplicate words in a given string.
Problem Statement
Create a Java program that:
- Prompts the user to enter a string.
- Splits the string into words.
- Finds and displays the words that appear more than once in the string.
- Displays the count of each duplicate word.
Example:
- Input:
"Java is great and Java is powerful"
- Output:
Duplicate words: Java: 2 is: 2
Solution Steps
- Read the String: Use the
Scanner
class to take the string as input from the user. - Split the String into Words: Use the
split()
method to break the string into words. - Store Word Counts: Use a
HashMap
to store each word of the string as keys and their counts as values. - Identify Duplicate Words: Iterate through the map to find words with a count greater than 1.
- Display the Duplicates: Print the duplicate words and their counts.
- Close Resources: Close the
Scanner
class object automatically using the try-resource statement.
Java Program
// Java Program to Find Duplicate Words in a String
import java.util.HashMap;
import java.util.Map;
import java.util.Scanner;
public class DuplicateWords {
public static void main(String[] args) {
// Step 1: Read the string from the user
try(Scanner scanner = new Scanner(System.in)){
System.out.print("Enter a string: ");
String input = scanner.nextLine();
// Step 2: Split the string into words
String[] words = input.split("\\s+");
// Step 3: Store word counts
Map<String, Integer> wordCountMap = new HashMap<>();
for (String word : words) {
word = word.toLowerCase(); // Normalize to handle case-insensitivity
wordCountMap.put(word, wordCountMap.getOrDefault(word, 0) + 1);
}
// Step 4: Identify and display duplicate words
System.out.println("Duplicate words:");
for (Map.Entry<String, Integer> entry : wordCountMap.entrySet()) {
if (entry.getValue() > 1) {
System.out.println(entry.getKey() + ": " + entry.getValue());
}
}
}
}
}
Explanation
Step 1: Read the String
- The
Scanner
class is used to read a string input from the user. ThenextLine()
method captures the entire line as a string.
Step 2: Split the String into Words
- The
split()
method is used to divide the string into words based on whitespace. The regex\\s+
handles multiple spaces between words.
Step 3: Store Word Counts
- A
HashMap
is used to store the words as keys and their counts as values. The words are converted to lowercase to handle case-insensitivity.
Step 4: Identify and Display Duplicate Words
- The program iterates through the
HashMap
entries and checks for words with a count greater than 1. These words are duplicates and are printed along with their counts.
Output
Enter a string: Java is great and Java is powerful
Duplicate words:
java: 2
is: 2
Conclusion
This Java program demonstrates how to find and display duplicate words in a user-input string. It covers important concepts such as string manipulation, using maps to store word counts, and iterating through collections, making it a useful exercise for beginners learning Java programming.