Java Program to Find Duplicate Words in a String

3 min readDec 13, 2024

Introduction

Finding duplicate words in a string is a useful task in text processing, often needed for analyzing text or preparing data for natural language processing. This exercise helps you understand how to split strings into words, use data structures like maps in Java, and identify duplicates efficiently. This guide will walk you through writing a Java program that identifies and counts the duplicate words in a given string.

Problem Statement

Create a Java program that:

Prompts the user to enter a string.
Splits the string into words.
Finds and displays the words that appear more than once in the string.
Displays the count of each duplicate word.

Example:

Input: "Java is great and Java is powerful"
Output: Duplicate words: Java: 2 is: 2

Solution Steps

Read the String: Use the Scanner class to take the string as input from the user.
Split the String into Words: Use the split() method to break the string into words.
Store Word Counts: Use a HashMap to store each word of the string as keys and their counts as values.
Identify Duplicate Words: Iterate through the map to find words with a count greater than 1.
Display the Duplicates: Print the duplicate words and their counts.
Close Resources: Close the Scanner class object automatically using the try-resource statement.

Java Program

// Java Program to Find Duplicate Words in a String

import java.util.HashMap;
import java.util.Map;
import java.util.Scanner;

public class DuplicateWords {
    public static void main(String[] args) {
        // Step 1: Read the string from the user
        try(Scanner scanner = new Scanner(System.in)){
            System.out.print("Enter a string: ");
            String input = scanner.nextLine();
            
            // Step 2: Split the string into words
            String[] words = input.split("\\s+");
            
            // Step 3: Store word counts
            Map<String, Integer> wordCountMap = new HashMap<>();
            for (String word : words) {
                word = word.toLowerCase();  // Normalize to handle case-insensitivity
                wordCountMap.put(word, wordCountMap.getOrDefault(word, 0) + 1);
            }
            
            // Step 4: Identify and display duplicate words
            System.out.println("Duplicate words:");
            for (Map.Entry<String, Integer> entry : wordCountMap.entrySet()) {
                if (entry.getValue() > 1) {
                    System.out.println(entry.getKey() + ": " + entry.getValue());
                }
            }
        }
    }
}

Explanation

Step 1: Read the String

The Scanner class is used to read a string input from the user. The nextLine() method captures the entire line as a string.

Step 2: Split the String into Words

The split() method is used to divide the string into words based on whitespace. The regex \\s+ handles multiple spaces between words.

Step 3: Store Word Counts

A HashMap is used to store the words as keys and their counts as values. The words are converted to lowercase to handle case-insensitivity.

Step 4: Identify and Display Duplicate Words

The program iterates through the HashMap entries and checks for words with a count greater than 1. These words are duplicates and are printed along with their counts.

Output

Enter a string: Java is great and Java is powerful
Duplicate words:
java: 2
is: 2

Conclusion

This Java program demonstrates how to find and display duplicate words in a user-input string. It covers important concepts such as string manipulation, using maps to store word counts, and iterating through collections, making it a useful exercise for beginners learning Java programming.