– Hi everyone, my name is Neo Ge. I co-founded IPFSMAIN back in 2018. And we are the service provider instead of the Filecoin ecosystem. We've been committed
to the Filecoin Project for almost three years now. And it was a historical moment for the Filecoin mainnet
launched last week, and I'm really happy to present today. The talk today will be very similar to the minor show and
tell I gave last month. I'm going to go through our
Filecoin mining solutions, which includes Mining
Architecture and Network Topology then I'm going to talk about our
hardware information as well.
Let's get started. Over the past two years, we have had multiple architecture designs for large scale Filecoin mining. After 1000s of hours of testing, we've decided to use this
Filecoin Mining Architecture. As you can see, when we talk about the architecture, we don't mean the macro architecture. We believe Filecoin mining architecture is the design of communications network includes the physical and
logical layout of the network. It also includes the
framework of specifications of equipments, services, protocols, and functions of Filecoin mining.
In this current mining solution, we are dividing major hardware into three different pieces, the storage rigs, sealing
rigs and SNARKs rigs and by separating the storage
and computational part, this architecture gives
us really good scalability and improves the hardware
utilization efficiency on tremendous skill. In this design, we want storage rigs to complete and the first step of PoST generate candidates on their own. At the same time, we
use Sealing Rate Cluster for seal_pre_commit_phase1
aka PC1, PC2 and C1, and SNARK Rig Cluster
for C2 and window PoST.
We modified and rewrote part of the code, and made some redundancy
changes to (coughs) sorry improve the PoST success rate. By doing all that with
this mining architecture, we've significantly improved
the mining efficiency especially for P1 and C2. For example, we could manage
to finish C2 under 10 minutes at the start of SpaceRace one and we've done some other
further improvements recently as well. We believe a robust network architecture should facilitate system
level functionality, as well as robustness and scalability. It's especially in the face of changes in software and hardware components, application functions and so on. In our Network Topology, we are using two different
internet service providers for each WAN to increase the redundancy. The Communication core
switch uses multiple switches with 10-Gigabit optical
ports for stacking, which is mainly connected to
lotus_node and lotus_miner. The Data core switch uses
dual switches with 10-Gigabit optical ports for stacking, which is connected to lotus_miner, storage, seal_worker & SNARK Cluster.
And the Storage access switches have four 10-Gigabit optical ports plus 24 Gigabit electrical ports, duo-switch stacking of course. The 10-Gigabit optical ports are connected to Data
aggregation switches, and Gigabit electrical ports
are connected to storage. Each storage is connected
with two network cables respectively to the two switches. And we did the dual network port binding to improve the storage
network stability as well. When you are mining
Filecoin at large scale, especially like something
over hundreds of petabytes and a well built operation
and maintenance platform is a must to keep everything in order. So next I would like to
highlight our one stop O&M Platform called FIL Manager here as part of our Filecoin mining solution.
The FIL Manager includes
CMDB CI/CD Monitoring System and Work Order System. I'm going to break it down into details with a screenshot for each system. The Configuration
Management Database aka CMDB is an ITIL database used
to store information about hardware and software assets. It is useful to break
down configuration items into logical errors. And CMDB helps us
understand the relationship between the components of
our Filecoin mining system and to to track their configurations. It can also be used for
things like impact analysis, root cause analysis and change management. So in the FIL Manager,
we use it to help us automatically discover the host, get the host hardware information, identify host failures and identify which cluster the host is working for in real time. And see CI/CD means continuous integration continuous delivery.
It bridges the gaps between development and operation activities and teams by enforcing automation in building testing and
deployment of applications. Our CI/CD module can help us perform fast and accurate deployment whenever there's an update
in the Filecoin network. The component monitoring system provides comprehensive
monitoring of the health status of the components like lotus, miner, seal, storage, snark, window PoST. Like this one Lotus monitor
helps us node synchronization and monitoring the amanbo as well as in case our mess is
being stuck in the mempool and the minor monitor
helps us on order status, minor status order management like practicing different pricing strategy and sector lifecycle tracking.
The seal monitor focuses
on the working status of seal workers like what is the sector ID and what stage that
sector is currently in. It is able to identify various errors that occurred during the sealing process and do with fast automatically, it will start receiving new tasks until the fault is repaired. The storage monitor helps us
on the sector storage status. It will find the storage failures actively and perform data migration in real time and make the migration history traceable.
Snark monitor focuses on
the working status of snark and the resource demanded. We know PoST monitor is
responsible for PoST working status and history of all miners. And these monitors as mentioned are all part of the
component monitoring system. After that, we have the
Host Monitoring interface. All the monitoring
system can work together to identify and classify faults in hardware services and networks. And notify the corresponding staff to deal with the meantime, the work order system manages the workflow and can track various operations
and maintenance events. All thoughts identified
through the monitoring system can automatically generate work orders and dispatch them to the
corresponding person in charge. This is our exception handling system. The system catches all the exceptions through locks and notified
O&M personnel to deal with it. If there are any advanced exceptions, like synchronization exception,
window PoST exception the system will call the
corresponding O&M personnel through the phone interface and ask them to deal with the exception. Hardware information, just like I presented in
the Miner Show and Tell I believe designing a
good fiber mining solution is like building a wooden bucket. No matter how high a bucket is, the capacity of the bucket
depends on the shortest plant and Filecoin mining is different.
It is different than any other
computational crypto mining like Bitcoin or Ethereum. Fiber mining is complex, especially when you are
mining at a meaningful scale. If you are sitting too fast,
for example, or too slow, so that other components
have to wait and stay idle, then you are probably wasting
the hardware performance. So the key is to is to find a balance with cost effective hardware is so obvious that cost
efficiency is another metric of Filecoin mining solution. And as a Filecoin miner, you just need to find the right hardware instead of the best ones.
I made an analogy that smile, it's like running a Uber using Lamborghini versus a Honda Civic. The hardware requirements
for Filecoin mining are are highly related to the
computational resources needed to, to see what sector and generating regular proof of space time for every CEO to sector. And these operations are
computationally expensive. And things like algorithm implementation, sector size would affect
the hardware requirements. So let's focus on the current
main net with 32 gig sector. First, as a storage miner, you need a CPU with at least 8 cores. Also the CPU has to support
an Intel SHA extension. If you are choosing in vCPUs, you will need at least a sin
Mecca architecture or later. And if you choosing
Intel CPUs you will need, I believe is ice lake processor or later.
Secondly, you would need
at least 128 gig of RAM and as indicated in the icon box there should be this should be
complemented with a 256 gigabytes of swap on fast NVMe SSD storage medium, and a powerful GPU could
significantly speed up the Snark computations. Effective team has provided
a list of supported GPUs and then but if you want to
choose to use other GPUs, you can definitely do that. But you have to manually enable them. And remembered please
don't make the AMD GPUs and NVIDIA GPUs together as it will cause issues with open CL.
And the disk selection is
is really important as well. The fact that network parameters
are over 100 gigabytes and needs to be read and
verify during miner start. So, you will need at least one terabyte in the EBS data disk
space for cache storage. And this disk will be
used to store the data during the sealing process to
to cache files and parameter and to to serve as general
temporal storage location. And on top of that, you will
need additional hard drives for the final stage sort of
final storage of seal sectors and Lotus chain and etc. Please know that the above stack, I was talking about our
general requirements assuming all the mining operations are running by the same machine, which is different than our
solution and requirements showing in the current slide. And like I mentioned earlier, we've managed to separate everything into three different parts. Another thing I like to highlight is that we are ready for the algorithm algorithm upgrade from SDR to NSC.
And that's actually one of the reasons why we chose current architecture
and hardware designs. They're built for algorithm upgrade. And I believe to make
Filecoin successful project we need to to to move most of our parties, and most of the parties in
the psychological system closer to users. The current storage service provider could only go down to the IDC level. And I believe one of the
most important milestone of Filecoin project would be the time that we lower
the hardware requirements to a level where everyone
could mine at any scale. And as a storage miner, we want that milestone to
come sooner than later. It may sound crazy when I say this because we are one of the largest Filecoin miner in the world. Reducing the hardware requirements would lead to more miners then increase the cost
per Filecoin we mine. But the key is to find is key to treat a Filecoin
mining as a long term thing. To be successful miner of Filecoin the Filecoin network needs to
be a successful network first. And that's why we are supportive even to most of the miner
our friendly parameters in the Filecoin crypto economic model.
It's super exciting to imagine
the future of distributed web I believe Filecoin would
become a big part of it, where we could, or where we
will see loss of storage miners, retrieval miners and repair miners. Hardware requirements
won't be a barrier anymore. With repair mentors help people
could probably mine Filecoin whatever hardware they desire. And whenever they want
with flexible time framing, instead of 24/7, people can easily
transfer petabytes of data with extremely cheap bandwidth.
I think this is the end of my talk. Thank you so much for watching. I think, yeah, two minutes over the time. Sorry about that. Next, I think I'm going to turn
it over to the next speaker from StorageHive. – Yeah, so my dogs going to be, I think I was in showing
Dell Space Race for, it's going to be a similar kind
of information we will cover, but definitely a quite
a different approach to what Neo just talked about. So on my site, I kind
of looked at, you know, as a small miner, solo miner, I do actually have two other
people in my company as well, that's, that's helping out. So we're not quite solo, but certainly on the small side, for now, you are quite keen and interested on the on the Filecoin platform, I want to understand the
economics of the platform and how we can scale it up.
So really kind of focus on you know, how to build a platform as
cost effective as possible, and at the same time, maintain liability. So kind of a lot of people
talk that, you know, what are they doing, and what worked, I actually want to step back a little bit, as my wording says there, you can kind of learn from my
mistakes got scratched out, you know, experimentation. So that kind of thing
really key for all people that might be new. I know, the process has
been around for three years. But now, of course, as it's
going to get more popular maintenance has launched,
a lot of the crypto news, news sites are talking about it.
So more and more people are
going to come into the platform and and want to participate. So hopefully, some of the
information I will share will help people speed up their process of getting into the platform. It certainly it needs
to be reiterated for. This is not like you
know Bitcoin or Ethereum you don't just buy a system, plug it into the network,
and it's up and running.
You know, it's it's a,
it's a complex platform, there are software involved, there's hardware, storage,
networking, internet, all those things have to
work really well together to just work in the first place. And then once you kind
of get to that level, then the next level is to be able to actually do it cost effectively. So it's not for the faint
of heart to get into this. However, if you do is successful, it's really kind of exciting once you get to that point.
So let me kind of jump into talk what I tried to focus on
is not on the high end, but kind of on the more on the you know, single person small end miner what kind of like, you know,
the bare minimum requirement that you at least for now, I know the systems are changing, things are being optimized, as new also mentioned,
you know, he's, you know, highly recommending that
requirements do come down.
But at the moment, the
requirements are what it is. So as of, you know,
today, October 20th 2020. This is kind of like
the minimum requirements that you need to get a
miner up and running. There are like three major
steps to the sealing process, there's pre-commit one, pre-commit two. commit one is there, but
it's like it runs so fast, you don't really notice
it, nobody talks about it, and then commit two.
So it's really kind of
the three major steps are pre-commit one,
pre-commit two and commit two. And so on pre-commit one, this step for now on I know there are actually some changes that are being worked on
it's in test at the moment that will allow to use multiple cores. So that's will be coming
down the line pretty soon but as of right now, it's
still at this point in time, heavily based on just CPU and note no GPU at the
moment for PC1 work.
And so kind of on the bad side. Yeah, like I said you need
to isolate or newer Intel CPU if you want to go that route. But yeah, in general, most industry CPUs don't really work well, especially on the on
the lower end systems. So I'd recommend that
you have an AMD based CPU especially rising fives
are really cost effective at the moment to run those
as the multi threaded option of PC one that's going
to become available, then the rise in sevens and rising nines will probably become more popular. For for PC1, but for right now, rising three and five is
like really cost effective.
To get your PC1 rig up and running. But on a gypsy side, what I found is nothing like
you have to kind of be careful. On the bottom end machines, a lot of them will only
support like 32 gigabytes, sometimes only 64 gigabytes. If you do get a motherboard
with that are two chipsets, I know the B 55 is one,
I think the 450 as well. But the B 50 definitely
does support 128 gigabytes on the motherboard because
you need at least 64 gigs available for every pc one
process that you want to run. So with having 128 gigabytes available, you will be able to run at least two PC1 processes on that one system. Soft face from what I've learned is you do actually need a little bit more than what the official documentation says I've noticed that the
way the software does the calculation for what's available, you know, there's overhead with, you know, other processes and things that's already
running on the machine. And you need to make sure that you have space for that plus thing for Filecoin to run. So it's always wise, you want to make sure
I've done it 16 weeks, I don't see gigabytes available.
Swap space, again should be on an MV storage to make sure you can swap really fast. The space. Yeah, this is the part where
it's doing all the work. And you certainly want
to be on SSDs or NMV. For the for the work that's being done and kind of human being and doing your deals and
different steps coming in. I know one process from
a 1632 gigabyte sector, it will actually grow to about
500 gigabytes of workspace as is doing its sealing process. And yes, you need probably like bare minimum one terabyte, two terabyte is certainly
more comfortable. But I even seen on some workers, I use four terabytes because
depending on you know, if there are, you know, some tasks that are waiting for deals, or it's kind of waiting
for a process or task to get transferred to another worker, as all those waiting has happened, it's good if you have extra space, so that the system in the meantime can still actually start sealing additional additional sector.
Networking initially started with one gig, don't recommend it, I mean, those 500 gigabyte workspace, when a PC1 gets transferred
to a PC2 worker, it's 500 gigabytes that
needs to be moved around on the on the network. And that takes like
almost an hour and a half on a one gig link. So you certainly want to look at 10 gig and an algorithm in the
prices have come down quite a bit on 10 gig, so it is becoming reasonably affordable. And if you only have a main miner, and a few workers, you can probably even
just directly cable them, you don't necessarily have to worry about having a switch just got to you know, figure out your outing and stuff so that all the machines
can reach each other. And yeah, and on the PC1, this at the moment no GPU needed.
So that's kind of like the PC1 process as kind of the bad side and good side that you want to use. Then PC2/C2 in my environment, I actually run them on
the same set of hardware because at the moment, they do kind of need similar requirements. Both processes do need more threads and reasonably moderate
amount of memory and and GPU. That's kind of the main driver difference between having different
set of hardware for for PC1 and different hardware for PC2/C2. So again, on the bad side, yeah, you don't want to use any CPU that's older than five years. Memory, you got to be over
120 gigabytes or more. Yeah, if you again, in general, anything we
while you sealing data, spinning hard drives as
specific as a single hard drive is not going to get you far very fast.
It has to be honest as
these may be one comment also notice these SSDs, you got to be careful with like really low end consumer level is these SSDs. On one of my workers, I did
kind of go really low end. And like shortly after SpaceRace, that node that actually die because the enemy like
literally got burned up. Because it can only take so
many rights during its lifespan. So you do need SSDs and you
are going to need you know, higher level like enterprise level SSDs that can maintain and
handle quite a lot of writes and rewrites on on the disk.
GPU wise as well. You want to make sure if you are going to go a
little bit lower end GPU, it should have at least
six gigabytes or more memory per processor
on the on the GPU card. Because the Filecoin process allocates around four and a half than
a white four and a half we had number but it does
allocate like four and a half gigabytes of memory on on the screen card. So do get a card with
at least six gigabytes allocated on it. Actually, but I've seen
some times it does. You do get some errors on it sometimes stolen your location. It is still better to have like a thin API or something that has at least
I think it has 10 gigabytes of memory on that card. So that's kind of like the
sweet spot on the, you know, entry level card would be the thin API. I mean, in general, on the awesome side, you do need more fit to
do the work of PC2/C2 so arise in five, or seven
and nines work really well, for these, in general, against Ops Base, it's got to be a little bit
higher than then the memory that you have in the system.
(coughs gently) Then you're less so on the PoST, just kind of for people
starting out scratch I've seen like over and over
and over on the message boards, this is kind of like the main
thing that they tripped over. Once they got, you know
the PC1 and PC2 working that's not too hard to
kind of get accomplished. But probably the most difficult thing is to get you're the window
PoST to complete successfully in a timeframe. And there is with it's these Lotus miner proving deadlines command that you run, it has 38 48 sections as
basically a 24 hour window of 30 minutes sections, where every 30 minutes, it does your proof some of the work that's already been sealed. The challenge that it sends, you need to be able to do all the work, retrieve the data to the hash calculation and send the results, put it on the message pool, and then send it back to the chain.
And all of that needs to
happen in in 30 minutes. And if you don't have adequate hardware, that's why you're going to have
so when you see faulty sectors, or you lost all your power on the network, it's usually because your your PoSTs are just not completing in time. I initially did start with a Ryzen 5+ as my my main kind of
system during the post. And it was just not fast enough. I did upgrade to the to
the AMD Threadripper, which works really awesome. But I would say now that
I've learned a bit more, it's probably overkill. So you can probably get away with if you have a dedicated
machine for the PoST a Ryzen 9 should do the job as well.
And again, good GPUs at
least at Navi or better and the memory and display requirements that you suddenly be able
to get you through your through your PoSTs. (coughs gently) And just to kind of quick reference, I mean this is kind of how
I monitor my environment got a whole bunch of buddy
stations up and running and seeing what all the
systems are doing is it's quite feeling good to see that
everything is actually working and and seeing some data..