Vikipeedia:Üldine arutelu: erinevus redaktsioonide vahel

Eemaldatud sisu Lisatud sisu
404. rida:
 
--[[Kasutaja:Hallsilm|Hallsilm]] ([[Kasutaja arutelu:Hallsilm|arutelu]]) 4. august 2015, kell 21:35 (EEST)
 
== Wiki labels & Revision Scoring as a Service for Estonian Wikipedia ==
 
Hello Estonian Wikipedia,
 
I apologize for my complete lack of Estonian skills. I would most welcome if my post is translated to Estonian.
 
So computers are very good at crunching numbers. Your average calculator can out smart you in arithmetic. However computers are terrible at pretty much in everything else. Programming computers to under take any task no matter how simple beyond computing tends to be very difficult. This is where Artificial Intelligence comes in. With Artificial Intelligence we teach computers how to solve problems without explicit programming for the solution. This is what we are doing.
 
We are working on a project called [[m:Research:Revision scoring as a service]] which aims to provide quality control Artificial Intelligence infrastructure for Mediawiki and Wikimedia projects. We already have our system implemented and running on Azerbaijani, English, French, Indonesian, Persian, Portuguese, Spanish, Turkish and Vietnamese editions on Wikipedia. We are hoping to adapt our tool to serve Estonian language as well as [[m:Research:Revision scoring as a service/Word lists|a number of other languages]].
 
We are currently mainly focusing on vandalism detection where we provide an API ([[m:ORES]]) that provides scores. We have made an effort to keep our system robust.
 
The examples I'll provide are based on a machine learning algorithm that was trained to use 20,000 reverted edits. This is kind of modelling is problematic for two reasons. First is, there are non-vandalism related reasons for edits to be reverted such as mistakes from new users, this would develop such an unproductive bias. Second problem would be it lacks the ability to distinguish good faith users from malicious ones. To demonstrate our system I will give three examples from English wikipedia. I have picked these three semi-random.
* [http://ores.wmflabs.org/scores/enwiki/reverted/674969374/ Score of 90%] [https://en.wikipedia.org/wiki/?diff=674969374 diff] [[:en:Moncef Mezghanni]]
** As visible in the diff, it is clearly something that shouldn't be welcome on English wikipedia. Algorithms confidence also matches my human assessment.
* [http://ores.wmflabs.org/scores/enwiki/reverted/674986575/ Score of 75%] [https://en.wikipedia.org/wiki/?diff=674986575 diff] [[:en:Monin]]
** When I look at the diff it isn't immediately clear to me if this should be reverted. Detailed look reveals that prior version had more neutral information, but new version at a glance isn't exactly clear cut vandalism, albeit spammy. Algorithms confidence drops just as my human assessment.
* [http://ores.wmflabs.org/scores/enwiki/reverted/674984786/ Score of 19%] [https://en.wikipedia.org/wiki/?diff=674984786 diff] [[:en:Curiosity killed the cat, but satisfaction brought it back]]
** As visible in the diff this edit clearly improves the article. The algorithms confidence plummets as well. Algorithm is more confident that this edit should NOT be reveted.
 
We are also working towards a system for article quality where we use existing assessment by [[:en:Wikipedia:Version 1.0 Editorial Team]]to train our system. We only have this system on English wikipedia at the moment but we would be more than happy to expand to other language editions. I am uncertain if Dutch Wikipedia has a similar quality assessment scale. I have picked 5 random articles to demonstrate this.
* [http://ores.wmflabs.org/scores/enwiki/wp10/649291958/ Predicted: Start class] (not even assessed) [https://en.wikipedia.org/wiki/?oldid=649291958 Perm link] [[:en:Maidenhead Advertiser]]
* [http://ores.wmflabs.org/scores/enwiki/wp10/659931375/ Predicted: Stub class] (actually marked Stub class) [https://en.wikipedia.org/wiki/?oldid=659931375 Perm link] [[:en:Joel Turrill]]
* [http://ores.wmflabs.org/scores/enwiki/wp10/606448494/ Predicted: C class] (actually marked stub class) [https://en.wikipedia.org/wiki/?oldid=606448494 Perm link] [[:en:Kajaanin Haka]]
* [http://ores.wmflabs.org/scores/enwiki/wp10/609061420/ Predicted: C class] (actually marked C class) [https://en.wikipedia.org/wiki/?oldid=609061420 Perm link] [[:en:Castell Arnallt]]
* [http://ores.wmflabs.org/scores/enwiki/wp10/674994124/ Predicted: Featured class] (actually marked Featured article) [https://en.wikipedia.org/wiki/?oldid=674994124 Perm link] [[:en:Hurricane Diane]]
 
Typical problem is that humans typically do not re-asses articles over time or articles are never assessed in the first place. Our system circumvents this problem by automating this.
 
We unfortunately lack language features such as [[m:Research:Revision scoring as a service/Word lists/et|bad words, informal words and stop words]]. This would be very helpful. We also need a localization of [[:en:Wikipedia:Labels]] serving as our local landing page.
 
Once these are complete, we would like to start an edit quality campaign where we request the local community to hand code/label ~2000 revisions labeling them productive/damaging and good faith/bad faith. This would be similar to the campaign on English Wikipedia [[:en:Wikipedia:Labels/Edit quality]].
 
After this we will be able to generate scores for revisions that is usable by gadgets such as ScoredRevisions as well as (potentially) tools like huggle. If community desires it, it can even be used to create a local vandalism reversion bot.
 
So in a nutshell our algorithm relies on community input to support the community. Feel free to ask any questions. Either here, on [[m:Research talk:Revision scoring as a service|meta]] or on IRC on the freenode server and #wikimedia-ai channel where we hang out. You can also reach us at https://github.com/wiki-ai
 
--<small> [[User:とある白い猫|とある白い猫]]</small> <sup>[[User talk:とある白い猫|chi?]]</sup> 7. august 2015, kell 22:43 (EEST)
:[[User:Kruusamägi]] can you help with this? --<small> [[User:とある白い猫|とある白い猫]]</small> <sup>[[User talk:とある白い猫|chi?]]</sup> 7. august 2015, kell 22:43 (EEST)