java词频(java词频统计输出)

华为云服务器特价优惠火热进行中!

2核2G2兆仅需 38 元;4核4G3兆仅需 79 元。购买时间越长越优惠!更多配置及优惠价格请咨询客服。

合作流程:
1、点击链接注册/关联华为云账号:点击跳转
2、添加客服微信号:cloud7591,确定产品方案、价格方案、服务支持方案等;
3、客服协助购买,并拉微信技术服务群,享受一对一免费技术支持服务;
技术专家在金蝶、华为、腾讯原厂有多年工作经验,并已从事云计算服务8年,可对域名、备案、网站搭建、系统部署、AI人工智能、云资源规划等上云常见问题提供更专业靠谱的服务,对相应产品提供更优惠的报价和方案,欢迎咨询。

本篇文章给大家谈谈java词频,以及java词频统计输出对应的知识点,希望对各位有所帮助,不要忘了收藏本站喔。

微信号:cloud7591
如需了解更多,欢迎添加客服微信咨询。
复制微信号

本文目录一览:

java计算一篇英文文档词频 并按出现次数从高到低输出(以下基础上补充)谢谢!

String result = sb.toString();

String[] Str = result.split("[^A-Za-z0-9]"); //quanbu

for(String string:Str){

singleSet.add(string);

if("".equals(string)){ //这里是我加的,去除空格次数的处理

singleSet.remove("");

}

}

MapString, Integer map=new HashMapString, Integer();

for (String childString : singleSet){

int count=0;

for(String fatherString : Str){

if(fatherString.equals(childString)){

count++;

}

}

map.put(childString, count); //存储在hashmap中

}

ArrayListEntryString,Integer l = new ArrayListEntryString,Integer(map.entrySet());

Collections.sort(l, new ComparatorObject(){

public int compare(Object e1, Object e2){

int v1 = Integer.parseInt(((EntryString,Integer)e1).getValue().toString());

int v2 = Integer.parseInt(((Entry)e2).getValue().toString());

return v2-v1; //改为v1-v2就是从小到大了

}

});

for (EntryString, Integer e: l){

System.out.println(e.getKey()+" "+e.getValue());

}

代码仅供参考!希望对你有用

用JAVA语言设计一个类,统计一篇英文文章的词频,并按照词频由高到低输出。修改下面代码就行了。

这题目如果能增加一个类的话会高效很多。。。如果非要在这个框框里面,代码麻烦 效率低下呢。

import java.util.ArrayList;

import java.util.Collections;

import java.util.Comparator;

import java.util.Iterator;

import java.util.List;

import java.util.Set;

import java.util.TreeSet;

public class Article {

//保存文章的内容

String content;

//保存分割后的单词集合

String[] rawWords;

//保存统计后的单词集合

String[] words;

//保存单词对应的词频

int[] wordFreqs;

//构造函数,输入文章内容

//提高部分:从文件中读取

public Article() {

content = "kolya is one of the richest films i've seen in some time . zdenek sverak plays a confirmed old bachelor ( who's likely to remain so ) , who finds his life as a czech cellist increasingly impacted by the five-year old boy that he's taking care of . though it ends rather abruptly-- and i'm whining , 'cause i wanted to spend more time with these characters-- the acting , writing , and production values are as high as , if not higher than , comparable american dramas . this father-and-son delight-- sverak also wrote the script , while his son , jan , directed-- won a golden globe for best foreign language film and , a couple days after i saw it , walked away an oscar . in czech and russian , with english subtitles . ";

}

//对文章根据分隔符进行分词,将结果保存到rawWords数组中

public void splitWord(){

//分词的时候,因为标点符号不参与,所以所有的符号全部替换为空格

final char SPACE = ' ';

content = content.replace('\'', SPACE).replace(',', SPACE).replace('.', SPACE);

content = content.replace('(', SPACE).replace(')', SPACE).replace('-', SPACE);

rawWords = content.split("\\s+");//凡是空格隔开的都算单词,上面替换了', 所以I've 被分成2个 //单词

}

//统计词,遍历数组

public void countWordFreq() {

//将所有出现的字符串放入唯一的set中,不用map,是因为map寻找效率太低了

SetString set = new TreeSetString();

for(String word: rawWords){

set.add(word);

}

Iterator ite = set.iterator();

ListString wordsList = new ArrayListString();

ListInteger freqList = new ArrayListInteger();

//多少个字符串未知,所以用list来保存先

while(ite.hasNext()){

String word = (String) ite.next();

int count = 0;//统计相同字符串的个数

for(String str: rawWords){

if(str.equals(word)){

count++;

}

}

wordsList.add(word);

freqList.add(count++);

}

//存入数组当中

words = wordsList.toArray(new String[0]);

wordFreqs = new int[freqList.size()];

for(int i = 0; i freqList.size(); i++){

wordFreqs[i] = freqList.get(i);

}

}

//根据词频,将词数组和词频数组进行降序排序

public void sort() {

class Word{

private String word;

private int freq;

public Word(String word, int freq){

this.word = word;

this.freq = freq;

}

}

//注意:此处排序,1)首先按照词频降序排列, 2)如果词频相同,按照字母降序排列,

//如 'abc' 'ab' 'aa'

class WordComparator implements Comparator{

public int compare(Object o1, Object o2) {

Word word1 = (Word) o1;

Word word2 = (Word) o2;

if(word1.freq word2.freq){

return 1;

}else if(word1.freq word2.freq){

return -1;

}else{

int len1 = word1.word.trim().length();

int len2 = word2.word.trim().length();

String min = len1 len2? word2.word: word1.word;

String max = len1 len2? word1.word: word2.word;

for(int i = 0; i min.length(); i++){

if(min.charAt(i) max.charAt(i)){

return 1;

}

}

return 1;

}

}

}

List wordList = new ArrayListWord();

for(int i = 0; i words.length; i++){

wordList.add(new Word(words[i], wordFreqs[i]));

}

Collections.sort(wordList, new WordComparator());

for(int i = 0; i wordList.size(); i++){

Word wor = (Word) wordList.get(i);

words[i] = wor.word;

wordFreqs[i] = wor.freq;

}

}

//将排序结果输出

public void printResult() {

System.out.println("Total " + words.length + " different words in the content!");

for(int i = 0; i words.length; i++){

System.out.println(wordFreqs[i] + " " + words[i]);

}

}

//测试类的功能

public static void main(String[] args) {

Article a = new Article();

a.splitWord();

a.countWordFreq();

a.sort();

a.printResult();

}

}

-----------------------

Total 99 different words in the content!

5 and

4 the

4 i

4 a

3 as

2 with

2 who

2 to

2 time

2 sverak

2 son

2 s

2 old

2 of

2 it

2 in

2 his

2 czech

1 zdenek

1 year

1 wrote

1 writing

1 won

1 whining

1 while

1 wanted

1 walked

1 ve

1 values

1 though

1 this

1 these

1 that

1 than

1 taking

1 subtitles

1 spend

1 some

1 so

1 seen

1 script

1 saw

1 russian

1 richest

1 remain

1 rather

1 production

1 plays

1 oscar

1 one

1 not

1 more

1 m

1 likely

1 life

1 language

1 kolya

1 jan

1 is

1 increasingly

1 impacted

1 if

1 higher

1 high

1 he

1 golden

1 globe

1 foreign

1 for

1 five

1 finds

1 films

1 film

1 father

1 english

1 ends

1 dramas

1 directed

1 delight

1 days

1 couple

1 confirmed

1 comparable

1 characters

1 cellist

1 cause

1 care

1 by

1 boy

1 best

1 bachelor

1 away

1 are

1 an

1 american

1 also

1 after

1 acting

1 abruptly

java词频统计

在Java里面一个File既可以代表一个文件也可以代表一个目录(就是你所说的文件夹). 因此你可以直接把一个文件夹的path传进去new File(path), 然后再用list()就可以获得该文件夹下的所有文件数组, 再一个个的输入File流就行了, 可以这样写:

public void directory() {

File dir = new File("E:\temp");

File[] files = dir.listFiles();

}

java 如何统计txt文本中的总词数 不是总字数呀 TF–IDF 公式中需要用到

词频(TF)=某个词在文章中出现的次数

词频(TF)=某个词在文章中出现的次数/文章的总词数

或者:

词频(TF)=某个词在文章中出现的次数/该文出现次数最多的词的出现次数

逆文档率:

TF-IDF

TF-IDF=词频(TF)*逆文档率(IDF)

TF-IDF与一个词在文档中的出现次数成正比,与该词在整个语言中的出现次数成反比。

用java编程词频计数器,跪求各位大哥帮帮啊!!!!!

import java.awt.BorderLayout;

import java.awt.event.ActionEvent;

import java.awt.event.ActionListener;

import java.io.File;

import java.io.FileNotFoundException;

import java.util.Scanner;

import javax.swing.JButton;

import javax.swing.JFileChooser;

import javax.swing.JFrame;

import javax.swing.JLabel;

import javax.swing.JPanel;

import javax.swing.JTextField;

import javax.swing.SwingUtilities;

public class TestWin extends JFrame implements ActionListener {

private JButton openBtn = new JButton("选择文件");

private JLabel resultLabel = new JLabel("", JLabel.CENTER);

private JTextField textField = new JTextField();

private JButton calcBtn = new JButton("计算");

private JFileChooser fileChooser = new JFileChooser(".");

private File file;

public TestWin() {

openBtn.addActionListener(this);

calcBtn.addActionListener(this);

calcBtn.setEnabled(false);

resultLabel.setFont(resultLabel.getFont().deriveFont(60f));

add(openBtn, "North");

add(resultLabel, "Center");

JPanel southPane = new JPanel(new BorderLayout());

southPane.add(textField, "Center");

southPane.add(calcBtn, "East");

add(southPane, "South");

setSize(400, 300);

setLocationRelativeTo(null);

setDefaultCloseOperation(EXIT_ON_CLOSE);

}

@Override

public void actionPerformed(ActionEvent e) {

Object source = e.getSource();

if (this.openBtn == source) {

if (JFileChooser.APPROVE_OPTION == fileChooser.showOpenDialog(this)) {

file = fileChooser.getSelectedFile();

this.openBtn.setText(file.getName());

this.openBtn.setEnabled(false);

this.calcBtn.setEnabled(true);

}

} else if (this.calcBtn == source) {

this.calcBtn.setEnabled(false);

String word = this.textField.getText().trim();

if (word.isEmpty()) {

return;

}

Scanner in = null;

try {

in = new Scanner(file);

int count = 0;

while (in.hasNext()) {

if (word.equals(in.next())) {

count++;

}

}

this.resultLabel.setText("" + count);

} catch (FileNotFoundException e1) {

e1.printStackTrace();

} finally {

if (in != null) {

in.close();

System.out.println("Scanner关闭");

}

}

this.calcBtn.setEnabled(true);

this.openBtn.setEnabled(true);

this.openBtn.setText("选择文件");

}

}

public static void main(String[] args) {

SwingUtilities.invokeLater(new Runnable() {

@Override

public void run() {

new TestWin().setVisible(true);

}

});

}

}

java程序:统计单词词频,

不多说,先看代码:

import java.util.*;

import java.io.*;

public class wordsRate {

public static void main(String[] args) throws Exception {

BufferedReader infile = new BufferedReader(new FileReader("article.txt"));

String string;

String file = null;

while ((string = infile.readLine()) != null) {

file += string;

}

file = file.toLowerCase();

file = file.replaceAll("[^A-Za-z]", " ");

file = file.replaceAll("\\s+", " ");

String words[];

words = file.split("\\s+");

MapString, Integer hashMap = new HashMapString, Integer();

for (int i = 0; i words.length; i++) {

String key = words[i];

if (hashMap.get(key) != null) {

int value = ((Integer) hashMap.get(key)).intValue();

value++;

hashMap.put(key, new Integer(value));

} else {

hashMap.put(key, new Integer(1));

}

}

MapString, Object treeMap = new TreeMapString, Object(hashMap);

MapString, Object treeMap1 = new TreeMapString, Object(hashMap);

BufferedWriter bw = new BufferedWriter(new FileWriter("result.txt"));

//下面是我改动的你的代码:

Iterator iter = treeMap.entrySet().iterator();

//定义两个新的数组ss1和ss2,数组长度就是hashMap的长度,里面放分别是hashMap的value和key

String ss1[]=new String[treeMap.size()];;

int ss2[]=new int[treeMap.size()];

int i=0;

while (iter.hasNext()) {

Map.Entry entry = (Map.Entry) iter.next();

int val = (Integer)entry.getValue();

String key =(String) entry.getKey();

ss1[i]=key;

ss2[i]=val;

i++;

}

//下面将ss1数组进行排序,并将其与ss2数组的内容相对应起来

int sValue=0;

String sKey="";

for(int j=0;jss2.length;j++){

for(int k=0;ki;k++){

if(ss2[j]ss2[k]){

sValue=ss2[j];

sKey=ss1[j];

ss2[j]=ss2[k];

ss1[j]=ss1[k];

ss2[k]=sValue;

ss1[k]=sKey;

}

}

}

for(int j=0;jss2.length;j++){

System.out.println(ss1[j]+"="+ss2[j]);

bw.write(ss1[j]+"="+ss2[j]);

bw.newLine();

bw.flush();

}

}

}

代码是本人自己写的,也经过了自己的验证,肯定没问题,希望采纳。

功能实现了,我是将其key和value值放在了数组之中,然后进行排序,将其输出到了txt文件里

排序方式不一样,实现的方式也不一样,所谓仁者见仁智者见智。

关于java词频和java词频统计输出的介绍到此就结束了,不知道你从中找到你需要的信息了吗 ?如果你还想了解更多这方面的信息,记得收藏关注本站。

发布于 2023-04-01 15:04:18
收藏
分享
海报
53
目录

    忘记密码?

    图形验证码

    复制成功
    微信号: cloud7591
    如需了解更多,欢迎添加客服微信咨询。
    我知道了