We're updating the issue view to help you get more done.Learn more

Wordcount doesn't work for languages that don't use spaces for word separation

Originally reported on Google Code with ID 2049
What archive revision are you testing on? (See the revision number in the
footer). 3363

Original description:

If appropriate, enter the URL of a page where the problem can be seen:
http://archiveofourown.org/works/130634
http://archiveofourown.org/languages/ja/works

Look at some words written in languages with non-Roman character sets. Wonder why they apparently have 0 words!

We've fixed word counts for languages where words are delimited by spaces, that leaves non-space-delimited ones such as Chinese, Japanese, and Thai.

How to test:

  • Post a work in one of these languages: Chinese, Japanese (try both Hiragana and Katakana scripts), Thai. Their word counts should be the character counts.

    • Note: “this implementation doesn’t include punctuation. So for the same Chinese text, this word count would be about 10% less than the word count given by MS Word (but it should be the same as Apple’s Pages since Pages doesn’t include punctuation either), and around 10x more than the original word count.”

  • Find an old work in Chinese/Japanese/Thai, edit and save it to update the word count. See if the new word count is correct, like a new work.

  • Post a work in English, Spanish, Korean, Russian (Cyrillic script). Their word counts should be... their (space-delimited) word counts, or how it works on the Archive as of version 0.9.223.

Status

Assignee

Tatyafinwe

Reporter

Zooey.Glass04

Roadmap

Internationalization
Works

Priority

Medium

Fix versions

Components

BackEnd

Difficulty

Hard

Milestone

Internal 0.10

Google Code Issue ID

2049